Wrapper for keyword search function — keyword

This will use the keyword_search function to loop over all pdf files in a directory. Includes the ability to include subdirectories as well.

keyword_directory(
  directory,
  keyword,
  surround_lines = FALSE,
  ignore_case = FALSE,
  token_results = TRUE,
  split_pdf = FALSE,
  remove_hyphen = TRUE,
  convert_sentence = TRUE,
  remove_equations = TRUE,
  split_pattern = "\\p{WHITE_SPACE}{3,}",
  full_names = TRUE,
  file_pattern = ".pdf",
  recursive = FALSE,
  max_search = NULL,
  ...
)

Arguments

directory: The directory to perform the search for pdf files to search.
keyword: The keyword(s) to be used to search in the text. Multiple keywords can be specified with a character vector.
surround_lines: numeric/FALSE indicating whether the output should extract the surrouding lines of text in addition to the matching line. Default is FALSE, if not false, include a numeric number that indicates the additional number of surrounding lines that will be extracted.
ignore_case: TRUE/FALSE/vector of TRUE/FALSE, indicating whether the case of the keyword matters. Default is FALSE meaning that case of the keyword is literal. If a vector, must be same length as the keyword vector.
token_results: TRUE/FALSE indicating whether the results text returned should be split into tokens. See the tokenizers package and convert_tokens for more details. Defaults to TRUE.
split_pdf: TRUE/FALSE indicating whether to split the pdf using white space. This would be most useful with multicolumn pdf files. The split_pdf function attempts to recreate the column layout of the text into a single column starting with the left column and proceeding to the right.
remove_hyphen: TRUE/FALSE indicating whether hyphenated words should be adjusted to combine onto a single line. Default is TRUE.
convert_sentence: TRUE/FALSE indicating if individual lines of PDF file should be collapsed into a single large paragraph to perform keyword searching. Default is TRUE.
remove_equations: TRUE/FALSE indicating if equations should be removed. Default behavior is to search for a literal parenthesis, followed by at least one number followed by another parenthesis at the end of the text line. This will not detect other patterns or detect the entire equation if it is a multi-row equation.
split_pattern: Regular expression pattern used to split multicolumn PDF files using stringi::stri_split_regex. Default pattern is to split based on three or more consecutive white space characters.
full_names: TRUE/FALSE indicating if the full file path should be used. Default is TRUE, see list.files for more details.
file_pattern: An optional regular expression to select specific file names. Only files that match the regular expression will be searched. Defaults to all pdfs, i.e. ".pdf". See list.files for more details.
recursive: TRUE/FALSE indicating if subdirectories should be searched as well. Default is FALSE, see list.files for more details.
max_search: An optional numeric vector indicating the maximum number of pdfs to search. Will only search the first n cases.
...: token_function to pass to convert_tokens function.

Value

A tibble data frame that contains the keyword, location of match, the line of text match, and optionally the tokens associated with the line of text match. The output is combined (row binded) for all pdf input files.

Examples

# find directory
directory <- system.file('pdf', package = 'pdfsearch')

# do search over two files
keyword_directory(directory, 
       keyword = c('repeated measures', 'measurement error'),
       surround_lines = 1, full_names = TRUE)
#>    ID       pdf_name           keyword page_num line_num
#> 1   1 1501.00450.pdf repeated measures        1        9
#> 2   1 1501.00450.pdf repeated measures        2       31
#> 3   1 1501.00450.pdf repeated measures        2       58
#> 4   1 1501.00450.pdf repeated measures        2       60
#> 5   1 1501.00450.pdf repeated measures        3       70
#> 6   1 1501.00450.pdf repeated measures        6      169
#> 7   1 1501.00450.pdf repeated measures        6      180
#> 8   1 1501.00450.pdf repeated measures        6      185
#> 9   1 1501.00450.pdf repeated measures        9      315
#> 10  2 1610.00147.pdf measurement error        1        2
#> 11  2 1610.00147.pdf measurement error        1       10
#> 12  2 1610.00147.pdf measurement error        1       12
#> 13  2 1610.00147.pdf measurement error        2       16
#> 14  2 1610.00147.pdf measurement error        2       18
#> 15  2 1610.00147.pdf measurement error        2       19
#> 16  2 1610.00147.pdf measurement error        2       20
#> 17  2 1610.00147.pdf measurement error        3       35
#> 18  2 1610.00147.pdf measurement error        4       42
#> 19  2 1610.00147.pdf measurement error        4       43
#> 20  2 1610.00147.pdf measurement error        4       45
#> 21  2 1610.00147.pdf measurement error        4       51
#> 22  2 1610.00147.pdf measurement error        5       60
#> 23  2 1610.00147.pdf measurement error        6       76
#> 24  2 1610.00147.pdf measurement error        7       98
#> 25  2 1610.00147.pdf measurement error        8      111
#> 26  2 1610.00147.pdf measurement error       12      158
#> 27  2 1610.00147.pdf measurement error       12      163
#> 28  2 1610.00147.pdf measurement error       14      192
#> 29  2 1610.00147.pdf measurement error       14      194
#> 30  2 1610.00147.pdf measurement error       15      206
#> 31  2 1610.00147.pdf measurement error       17      228
#> 32  2 1610.00147.pdf measurement error       17      237
#> 33  2 1610.00147.pdf measurement error       18      248
#> 34  2 1610.00147.pdf measurement error       20      272
#> 35  2 1610.00147.pdf measurement error       21      298
#> 36  2 1610.00147.pdf measurement error       22      310
#> 37  2 1610.00147.pdf measurement error       22      311
#> 38  2 1610.00147.pdf measurement error       23      322
#> 39  2 1610.00147.pdf measurement error       24      337
#> 40  2 1610.00147.pdf measurement error       24      338
#> 41  2 1610.00147.pdf measurement error       24      339
#> 42  2 1610.00147.pdf measurement error       24      340
#> 43  2 1610.00147.pdf measurement error       25      344
#> 44  2 1610.00147.pdf measurement error       25      346
#> 45  2 1610.00147.pdf measurement error       29      458
#>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      line_text
#> 1                                                                                                                                                                                                                                                                                                                                                                                                                                                              We             Running under powered experiments have many perils. , Not introduce more sophisticated experimental designs, specifi-           only would we miss potentially beneficial effects, we may also cally the repeated measures design, including the crossover           get false confidence about lack of negative effects. , Statistical design and related variants, to increase KPI sensitivity with         power increases with larger effect size, and smaller variances. the same traffic size and duration of experiment. 
#> 2                                                                                                                                                                                                                                                                   a limitation to any online experimentation platform, where         within-subject variation. , We also discuss practical considfast iterations and testing many ideas can reap the most           erations to repeated measures design, with variants to the rewards.                                                           crossover design to study the carry over effect, including the “re-randomized” design (row 5 in table 1). , 1.1    Motivation To improve sensitivity of measurement, apart from accurate         1.2     Main Contributions implementation and increase sample size and duration, we           In this paper, we propose a framework called FORME (Flexcan employ statistical methods to reduce variance. 
#> 3                                                                                                                                                                                                                                                                                                                                                                                                                                                        In the Table 1: Repeated Measures Designs                        following section we assume the minimum experimentation “period” to be one full week, and may extend to up to two In this paper we extend the idea further by employing the          weeks. , To facilitate our illustration, in all the derivation repeated measures design in different stages of treatment          in this section we assume all users appear in all periods, assignment. , The traditional A/B test can be analyzed us-           i.e. no missing measurement. 
#> 4                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             The traditional A/B test can be analyzed us-           i.e. no missing measurement. , We also restrict ourselves ing the repeated measures analysis, reporting a “per week”         to metrics that are defined as simple average and assume treatment effect, as show in row 3 “parallel” design in ta-        treatment and control have the same sample size. , We furble 1. 
#> 5                                                                                                                                                                                                                                                                                                This way              average treatment effect (ATE) δ = µT − µC which is a each user serves as his/her own control in the measurement.        fixed effects in the model in this section. , This way, various In fact, the crossover design is a type of repeated measures       designs considered can be examined in the same framework design commonly used in biomedical research to control for         and easily compared., We will proceed to show, with theoretical derivations, that        2.1    Two Sample T-test given the same total traffic                                       Let X denote the observed average metric value in control group and Y denote that in the treatment group. 
#> 6             5.  , FLEXIBLE AND SCALABLE REPEATED One way to see measurements are not missing at random is                MEASURES ANALYSIS VIA FORME to realize infrequent users are more likely to have missing         5.1 Review of Existing Methods values and the absence in a specific time window can still          It is common to analyze data from repeated measures design provide information on the user behavior and in reality there       with the repeated measures ANOVA model and the F-test, might be other factors causing user to be missing that are          under certain assumptions, such as normality, sphericity (honot even observed. , Instead of throwing away data points             mogeneity of variances in differences between each pair of where user appeared in only one period and is exposed to            within-subject values), equal time points between subjects, only one of the two treatments, in practice, we included an         and no missing data. 
#> 7  X and Z are covariates in the model. , \022P            P            \023          In our cases they are indicators of treatment assignment, k Xik Pk0 Xi k 0 0 Cov(Xi , Xi ) = Cov 0          P        ,                         periods of the measurement, user id, and any other covariate. k Iik     k 0 Ii k 0 0 \022           \023                         As an example, one possible model for repeated measures Xi Xi0                              using lme4’s formula syntax (Bates et al. 2012a;b) is = Cov       , Ii Ii0                                   Y ∼ 1 + IsT reatment + P eriod + (1|U serID), where the last equality is by dividing both numerator and de-       where the only difference of this model to the usual linnominator by the same total number of users who have ever           ear model behind two sample test is the extra random efappeared in the experiments. , Thanks to the central limit            fect(clustered by UserID) to model user “baseline”. 
#> 8                                                                                                                                                                                                                                                                                                                                                                                                                                                                           (2013, Appendix B) for            Random effect makes modeling within-subject variability a similar example; also see (Van der Vaart 2000) for a text         possible. , In repeated measures data, users might appear in book treatment of the delta-method.                                 multiple periods, represented as multiple rows in the dataset. , As a result, rows of the dataset are not independent but 4.2    Metrics Beyond Average                                       with dependencies clustered by user. 
#> 9                                                                                                                                                                                                                                                                                                                                                                                                                                           • Re-randomized: If we suspect the presence of car7. , PRACTICAL CONSIDERATIONS                                          ryover effect, the re-randomized design enables us to At the design stage, we face a few choices under the same               measure it directly and should be used here. framework of repeated measures design. , Experimenters should           • Wash-out and decide: If we have little informause domain knowledge and past experiments to inform the                 tion to judge carry over effect, we can run the first design. 
#> 10                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         Data Fusion for Correcting Measurement Errors Tracy Schifeling, Jerome P. , Reiter, Maria DeYoreo∗ arXiv:1610.00147v1 [stat.ME] 1 Oct 2016 Abstract Often in surveys, key items are subject to measurement errors. , Given just the data, it can be difficult to determine the distribution of this error process, and hence to obtain accurate inferences that involve the error-prone variables. 
#> 11                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     In doing so, we account for the informative sampling design used to select the National Survey of College Graduates. , We also present a process for assessing the sensitivity of various analyses to different choices for the measurement error models. , Supplemental material is available online. 
#> 12                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               Supplemental material is available online. , KEY WORDS: fusion, imputation, measurement error, missing, survey. , ∗ This research was supported by The National Science Foundation under award SES-11-31897. 
#> 13                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          1, 1     Introduction Survey data often contain items that are subject to measurement errors. , For example, some respondents might misunderstand a question or accidentally select the wrong response, thereby providing values unequal to their factual values. 
#> 14                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    For example, some respondents might misunderstand a question or accidentally select the wrong response, thereby providing values unequal to their factual values. , Left uncorrected, these measurement errors can result in degraded inferences (Kim et al., 2015). , Unfortunately, the distribution of the measurement errors typically is not estimable from the survey data alone. 
#> 15                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       Left uncorrected, these measurement errors can result in degraded inferences (Kim et al., 2015). , Unfortunately, the distribution of the measurement errors typically is not estimable from the survey data alone. , One either needs to make strong assumptions about the measurement error process (e.g., as in Curran and Hussong, 2009), or leverage information from some other source of data, as we do here. 
#> 16                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      Unfortunately, the distribution of the measurement errors typically is not estimable from the survey data alone. , One either needs to make strong assumptions about the measurement error process (e.g., as in Curran and Hussong, 2009), or leverage information from some other source of data, as we do here. , One natural source of information is a validation sample, i.e., a dataset with both the reported, possibly erroneous values and the true values measured on the same individuals. 
#> 17                                                                                                                                                                                                                                                                                                                                                                                     It does not make sense to alter every individual’s reported values in the survey, as would be done using a conditional independence approach. , In this article, we develop a framework for leveraging information from gold standard data to improve inferences in surveys subject to measurement errors. , The basic idea is to encode plausible assumptions about the error process, e.g., most people do not make errors when reporting educational attainments, and the reporting process, e.g., when people make errors, they are more likely to report higher attainments than actual, into statistical models. 
#> 18                                                                                                                                                                                                                                                                                                                                                                                                                                                                         example of misreporting of educational attainment in data collected by the Census Bureau, so as to motivate the methodological developments. , In Section 3, we introduce the general framework for specifying measurement error models to leverage the information in gold standard data. , In Section 4, we apply the framework to handle potential measurement error in educational attainment in the 2010 American Community Survey (ACS), using the 2010 National Survey of College Graduates (NSCG) as a gold standard file. 
#> 19                                                                                                                                                                                                                                                                                                                                                                                                                                                                        In Section 3, we introduce the general framework for specifying measurement error models to leverage the information in gold standard data. , In Section 4, we apply the framework to handle potential measurement error in educational attainment in the 2010 American Community Survey (ACS), using the 2010 National Survey of College Graduates (NSCG) as a gold standard file. , In doing so, we deal with a key complication in the data integration: accounting for the informative sampling design used to sample the NSCG. 
#> 20                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              In doing so, we deal with a key complication in the data integration: accounting for the informative sampling design used to sample the NSCG. , We also demonstrate how the framework facilitates analysis of the sensitivity of conclusions to different measurement error model specifications. , In Section 5, we provide a brief summary. 
#> 21                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    These questions greatly reduce the possibility of respondent error, so that the educational attainment values in the NSCG can be considered a gold standard (Black et al., 2003). , The census long form, in contrast, did not include detailed follow up questions, so that reported educational attainment is prone to measurement error. , The Census Bureau linked each individual in the NSCG to their corresponding record in the long form data. 
#> 22                                                                                                                                                                          Census-reported education z           }|           {           BA     MA       Prof   PhD        Total BA      89580   4109 1241       249       95179 NSCG   MA        1218 33928       655   526       36327  reported Prof        382    359 8648      563        9952 education   PhD          99    193     452 6726         7470  Total     91279 38589 10996 8064           148928 No Degree    10150    1792      2040    337    14319 Other    33368   10912      4710   2406    51396 1993). , Because of the linkages, we can characterize the actual measurement error mechanism for educational attainment in the 1990 long form data. , In the NSCG, we treat the highest degree of the three most recent degrees reported (coded as “ed6c1”, “ed6c2”, and “ed6c3” in the file) as the true education level. 
#> 23                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              Of the individuals in the NSCG who had at least a college degree at the time of the 1990 census, about 93.3% of them have the same contemporaneous education levels in both files. , This suggests that most people report correctly, an observation we want to leverage when constructing measurement error models for education in the 2010 ACS. , In most situations, we do not have the good fortune of observing individuals’ errorprone and true values simultaneously. 
#> 24                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     Additionally, DE can include variables for which there is no corresponding variable in DG . , These variables do not play a role in the measurement error modeling, although they can be used in multiple imputation inferences. , We seek to estimate Pr(Y, Z | X), and use it to create multiple imputations for the missing values in Y for the individuals in DE . 
#> 25                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         For individual i, the full data likelihood (omitting parameters for simplicity) can be factored as Pr(Yi = k, Zi = l | Xi ) = Pr(Yi = k | Xi ) × Pr(Ei = e|Yi = k, Xi )Pr(Zi = l|Ei = e, Yi = k, Xi ). , (1) This separates the true data generation process and the measurement error generation process, which facilitates model specification. , In particular, we can use DG to estimate the true data distribution Pr(Y | X). 
#> 26                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     Without linked data, analysts cannot use exploratory data analysis to inform the model choice. , Instead, we recommend that analysts posit scientifically defensible measurement error models, and make post-hoc checks of the sensibility of analyses from those models. , We demonstrate this approach in Section 4. 
#> 27                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              This is akin to diagnostics in multiple imputation for missing data that compare imputed and observed values (Abayomi et al., 2008). , When these distributions differ substantially, it suggests the measurement error model specification (or possibly the true data model) is inadequate. , Such diagnostic checks only can reveal problems with the model specification; they do not indicate that a particular specification is correct. 
#> 28                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         In the NSCG, we discarded 38 records with race suppressed, leaving a sample size of nG = 77, 150. , We consider two sets of measurement error model specifications. , The first set uses specifications like those in Section 3, with flat prior distributions for all parameters. 
#> 29                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          The first set uses specifications like those in Section 3, with flat prior distributions for all parameters. , We use this set to illustrate model diagnostics and sensitivity analysis absent prior information about the measurement error process. , The second set uses a common error and reporting model with different, informative prior distributions on its parameters. 
#> 30                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              First, we use survey-weighted inferences to estimate population totals of (Y | X) from the 2010 NSCG. , Second, we turn these estimates into an approximate Bayesian posterior distribution for input to fitting the measurement error models used to impute plausible values of Yi for individuals in the ACS. , We now describe this process, which can be used generally when DG is collected via a complex survey design. 
#> 31                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        More precisely, let the ACS design-based estimator for Tx+ 16, Table 2: Summary of the first four measurement error model specifications for 2010 NSCG/ACS analysis. , These models use flat prior distributions on all parameters. 
#> 32                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              We include an example of this entire procedure in the supplementary material. , 4.2     Measurement error models The two sets of measurement error models include four that use flat prior distributions and three that use informative prior distributions based on the 1993 linked data. , For all error models, we use a logistic regression of Ei on various main effects and interactions of Yi and Xi . 
#> 33                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               In Model 4, the error and reporting models both depend on Y and sex. , For Models 5 – 7, we use the specification in Model 4 and incorporate prior information about the measurement errors from the 1993 linked data. , In constructing the priors, we first remove records that have been flagged as having missing education that has been imputed, because these imputations might not closely reflect the actual education values (Black et al., 2003). 
#> 34                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          19, 4.3     Empirical results We first examine what each model suggests about the extent and nature of the measurement errors in the 2010 ACS. , We then use the models to assess sensitivity of results about the substantive questions related to number of degrees and income. 
#> 35                                                                                                                                                                                                                                                                                                                                                                                                              We provide more thorough investigation of the impact of the prior specifications in the supplementary material. , Of course, we cannot be certain which model most closely reflects the true measurement error mechanism. , The best we can do is perform diagnostic tests to see which models, if any, should be discounted as not adequately describing the observed data. (m) For each ACS imputed dataset DE         under each model, we compute the sample pro(m)  portions, π̂xk , and corresponding multiple imputation 95% confidence intervals for all 165̇ unique values of (X, Y ). 
#> 36                                                                                                                                                                                                                                                                                                                                                                                                                       It seems plausible that the probability of misreporting education, as well as the reported value itself when errors are made, depend on both sex and true education level. , Additionally, the prior distribution from the 1993 linked data pulls estimates in groups with little sample size to measurement error distributions that seem more plausible on face value. , However, one need not use the data fusion framework for measurement error to select a single model; rather, one can use the framework to examine sensitivity of analyses to the different specifications. 
#> 37                                                                                                                                                                                                                                                                                                                                                                         Additionally, the prior distribution from the 1993 linked data pulls estimates in groups with little sample size to measurement error distributions that seem more plausible on face value. , However, one need not use the data fusion framework for measurement error to select a single model; rather, one can use the framework to examine sensitivity of analyses to the different specifications. , 4.3.2   Sensitivity analyses Figure 2 displays the multiply-imputed, survey-weighted inferences for the total number of women with science and engineering degrees, computing using the ACS-specific indicator variable. 
#> 38                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   We note that using the ACS-reported education without adjustments results in substantially higher estimated totals at the professional and Ph. , D. levels than any of the models that account for measurement error. , We also note that the CIA model yields considerably lower counts for all but bachelor’s degrees. 
#> 39                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  D. recipients than the other models. , 5     Concluding Remarks The framework presented in this article offers analysts tools for using the information in a high quality, separate data source to adjust for measurement errors in the database of interest. , Key to the framework is to replace conditional independence assumptions typically used in data fusion with carefully considered measurement error models. 
#> 40                                                                                                                                                                                                                                                                                                                                                                                                                                                                       5     Concluding Remarks The framework presented in this article offers analysts tools for using the information in a high quality, separate data source to adjust for measurement errors in the database of interest. , Key to the framework is to replace conditional independence assumptions typically used in data fusion with carefully considered measurement error models. , This avoids sacrificing information and facilitates analysis of the sensitivity of conclusions to alternative measurement error specifications. 
#> 41                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      Key to the framework is to replace conditional independence assumptions typically used in data fusion with carefully considered measurement error models. , This avoids sacrificing information and facilitates analysis of the sensitivity of conclusions to alternative measurement error specifications. , Analysts can use diagnostic tests to rule out some measurement error models, and perform sensibility tests on others to identify reasonable candidates. 
#> 42                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              This avoids sacrificing information and facilitates analysis of the sensitivity of conclusions to alternative measurement error specifications. , Analysts can use diagnostic tests to rule out some measurement error models, and perform sensibility tests on others to identify reasonable candidates. , 24
#> 43                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   The ACS estimate is the survey-weighted estimate based on the reported education level in the 2010 ACS. , Besides survey sampling contexts like the one considered here involving the ACS and NSCG, the framework offers potential approaches for dealing with possible measurement errors in organic (big) data. , This is increasingly important, as data stewards and analysts consider replacing or supplementing high quality but expensive surveys with inexpensive and large-sample organic data. 
#> 44                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 This is increasingly important, as data stewards and analysts consider replacing or supplementing high quality but expensive surveys with inexpensive and large-sample organic data. , Often, scant attention is paid to the potential impact of measurement errors on inferences from those data. , The framework could be used with high quality, validated surveys as the gold standard data, allowing for adjustments to the error-prone organic data. 
#> 45                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                M. , (2005), “Imputation of binary treatment variables with measurement error in administrative data,” Journal of the American Statistical Association, 100, 1123–1132. , 29
#>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           token_text
#> 1                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        we, running, under, powered, experiments, have, many, perils, not, introduce, more, sophisticated, experimental, designs, specifi, only, would, we, miss, potentially, beneficial, effects, we, may, also, cally, the, repeated, measures, design, including, the, crossover, get, false, confidence, about, lack, of, negative, effects, statistical, design, and, related, variants, to, increase, kpi, sensitivity, with, power, increases, with, larger, effect, size, and, smaller, variances, the, same, traffic, size, and, duration, of, experiment
#> 2                                                                                                                                                                                                                                                                                                                                         a, limitation, to, any, online, experimentation, platform, where, within, subject, variation, we, also, discuss, practical, considfast, iterations, and, testing, many, ideas, can, reap, the, most, erations, to, repeated, measures, design, with, variants, to, the, rewards, crossover, design, to, study, the, carry, over, effect, including, the, re, randomized, design, row, 5, in, table, 1, 1.1, motivation, to, improve, sensitivity, of, measurement, apart, from, accurate, 1.2, main, contributions, implementation, and, increase, sample, size, and, duration, we, in, this, paper, we, propose, a, framework, called, forme, flexcan, employ, statistical, methods, to, reduce, variance
#> 3                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  in, the, table, 1, repeated, measures, designs, following, section, we, assume, the, minimum, experimentation, period, to, be, one, full, week, and, may, extend, to, up, to, two, in, this, paper, we, extend, the, idea, further, by, employing, the, weeks, to, facilitate, our, illustration, in, all, the, derivation, repeated, measures, design, in, different, stages, of, treatment, in, this, section, we, assume, all, users, appear, in, all, periods, assignment, the, traditional, a, b, test, can, be, analyzed, us, i.e, no, missing, measurement
#> 4                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    the, traditional, a, b, test, can, be, analyzed, us, i.e, no, missing, measurement, we, also, restrict, ourselves, ing, the, repeated, measures, analysis, reporting, a, per, week, to, metrics, that, are, defined, as, simple, average, and, assume, treatment, effect, as, show, in, row, 3, parallel, design, in, ta, treatment, and, control, have, the, same, sample, size, we, furble, 1
#> 5                                                                                                                                                                                                                                                                                                                                   this, way, average, treatment, effect, ate, δ, µt, µc, which, is, a, each, user, serves, as, his, her, own, control, in, the, measurement, fixed, effects, in, the, model, in, this, section, this, way, various, in, fact, the, crossover, design, is, a, type, of, repeated, measures, designs, considered, can, be, examined, in, the, same, framework, design, commonly, used, in, biomedical, research, to, control, for, and, easily, compared, we, will, proceed, to, show, with, theoretical, derivations, that, 2.1, two, sample, t, test, given, the, same, total, traffic, let, x, denote, the, observed, average, metric, value, in, control, group, and, y, denote, that, in, the, treatment, group
#> 6  5, flexible, and, scalable, repeated, one, way, to, see, measurements, are, not, missing, at, random, is, measures, analysis, via, forme, to, realize, infrequent, users, are, more, likely, to, have, missing, 5.1, review, of, existing, methods, values, and, the, absence, in, a, specific, time, window, can, still, it, is, common, to, analyze, data, from, repeated, measures, design, provide, information, on, the, user, behavior, and, in, reality, there, with, the, repeated, measures, anova, model, and, the, f, test, might, be, other, factors, causing, user, to, be, missing, that, are, under, certain, assumptions, such, as, normality, sphericity, honot, even, observed, instead, of, throwing, away, data, points, mogeneity, of, variances, in, differences, between, each, pair, of, where, user, appeared, in, only, one, period, and, is, exposed, to, within, subject, values, equal, time, points, between, subjects, only, one, of, the, two, treatments, in, practice, we, included, an, and, no, missing, data
#> 7                                                                                                                                                                     x, and, z, are, covariates, in, the, model, p, p, in, our, cases, they, are, indicators, of, treatment, assignment, k, xik, pk0, xi, k, 0, 0, cov, xi, xi, cov, 0, p, periods, of, the, measurement, user, id, and, any, other, covariate, k, iik, k, 0, ii, k, 0, 0, as, an, example, one, possible, model, for, repeated, measures, xi, xi0, using, lme4, s, formula, syntax, bates, et, al, 2012a, b, is, cov, ii, ii0, y, 1, ist, reatment, p, eriod, 1, u, serid, where, the, last, equality, is, by, dividing, both, numerator, and, de, where, the, only, difference, of, this, model, to, the, usual, linnominator, by, the, same, total, number, of, users, who, have, ever, ear, model, behind, two, sample, test, is, the, extra, random, efappeared, in, the, experiments, thanks, to, the, central, limit, fect, clustered, by, userid, to, model, user, baseline
#> 8                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             2013, appendix, b, for, random, effect, makes, modeling, within, subject, variability, a, similar, example, also, see, van, der, vaart, 2000, for, a, text, possible, in, repeated, measures, data, users, might, appear, in, book, treatment, of, the, delta, method, multiple, periods, represented, as, multiple, rows, in, the, dataset, as, a, result, rows, of, the, dataset, are, not, independent, but, 4.2, metrics, beyond, average, with, dependencies, clustered, by, user
#> 9                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         re, randomized, if, we, suspect, the, presence, of, car7, practical, considerations, ryover, effect, the, re, randomized, design, enables, us, to, at, the, design, stage, we, face, a, few, choices, under, the, same, measure, it, directly, and, should, be, used, here, framework, of, repeated, measures, design, experimenters, should, wash, out, and, decide, if, we, have, little, informause, domain, knowledge, and, past, experiments, to, inform, the, tion, to, judge, carry, over, effect, we, can, run, the, first, design
#> 10                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      data, fusion, for, correcting, measurement, errors, tracy, schifeling, jerome, p, reiter, maria, deyoreo, arxiv, 1610.00147v1, stat.me, 1, oct, 2016, abstract, often, in, surveys, key, items, are, subject, to, measurement, errors, given, just, the, data, it, can, be, difficult, to, determine, the, distribution, of, this, error, process, and, hence, to, obtain, accurate, inferences, that, involve, the, error, prone, variables
#> 11                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         in, doing, so, we, account, for, the, informative, sampling, design, used, to, select, the, national, survey, of, college, graduates, we, also, present, a, process, for, assessing, the, sensitivity, of, various, analyses, to, different, choices, for, the, measurement, error, models, supplemental, material, is, available, online
#> 12                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          supplemental, material, is, available, online, key, words, fusion, imputation, measurement, error, missing, survey, this, research, was, supported, by, the, national, science, foundation, under, award, ses, 11, 31897
#> 13                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         1, 1, introduction, survey, data, often, contain, items, that, are, subject, to, measurement, errors, for, example, some, respondents, might, misunderstand, a, question, or, accidentally, select, the, wrong, response, thereby, providing, values, unequal, to, their, factual, values
#> 14                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       for, example, some, respondents, might, misunderstand, a, question, or, accidentally, select, the, wrong, response, thereby, providing, values, unequal, to, their, factual, values, left, uncorrected, these, measurement, errors, can, result, in, degraded, inferences, kim, et, al, 2015, unfortunately, the, distribution, of, the, measurement, errors, typically, is, not, estimable, from, the, survey, data, alone
#> 15                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     left, uncorrected, these, measurement, errors, can, result, in, degraded, inferences, kim, et, al, 2015, unfortunately, the, distribution, of, the, measurement, errors, typically, is, not, estimable, from, the, survey, data, alone, one, either, needs, to, make, strong, assumptions, about, the, measurement, error, process, e.g, as, in, curran, and, hussong, 2009, or, leverage, information, from, some, other, source, of, data, as, we, do, here
#> 16                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     unfortunately, the, distribution, of, the, measurement, errors, typically, is, not, estimable, from, the, survey, data, alone, one, either, needs, to, make, strong, assumptions, about, the, measurement, error, process, e.g, as, in, curran, and, hussong, 2009, or, leverage, information, from, some, other, source, of, data, as, we, do, here, one, natural, source, of, information, is, a, validation, sample, i.e, a, dataset, with, both, the, reported, possibly, erroneous, values, and, the, true, values, measured, on, the, same, individuals
#> 17                                                                                                                                                                                                                                                                                                                                                    it, does, not, make, sense, to, alter, every, individual’s, reported, values, in, the, survey, as, would, be, done, using, a, conditional, independence, approach, in, this, article, we, develop, a, framework, for, leveraging, information, from, gold, standard, data, to, improve, inferences, in, surveys, subject, to, measurement, errors, the, basic, idea, is, to, encode, plausible, assumptions, about, the, error, process, e.g, most, people, do, not, make, errors, when, reporting, educational, attainments, and, the, reporting, process, e.g, when, people, make, errors, they, are, more, likely, to, report, higher, attainments, than, actual, into, statistical, models
#> 18                                                                                                                                                                                                                                                                                                                                                                                                                                                   example, of, misreporting, of, educational, attainment, in, data, collected, by, the, census, bureau, so, as, to, motivate, the, methodological, developments, in, section, 3, we, introduce, the, general, framework, for, specifying, measurement, error, models, to, leverage, the, information, in, gold, standard, data, in, section, 4, we, apply, the, framework, to, handle, potential, measurement, error, in, educational, attainment, in, the, 2010, american, community, survey, acs, using, the, 2010, national, survey, of, college, graduates, nscg, as, a, gold, standard, file
#> 19                                                                                                                                                                                                                                                                                                                                                                                                                                               in, section, 3, we, introduce, the, general, framework, for, specifying, measurement, error, models, to, leverage, the, information, in, gold, standard, data, in, section, 4, we, apply, the, framework, to, handle, potential, measurement, error, in, educational, attainment, in, the, 2010, american, community, survey, acs, using, the, 2010, national, survey, of, college, graduates, nscg, as, a, gold, standard, file, in, doing, so, we, deal, with, a, key, complication, in, the, data, integration, accounting, for, the, informative, sampling, design, used, to, sample, the, nscg
#> 20                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             in, doing, so, we, deal, with, a, key, complication, in, the, data, integration, accounting, for, the, informative, sampling, design, used, to, sample, the, nscg, we, also, demonstrate, how, the, framework, facilitates, analysis, of, the, sensitivity, of, conclusions, to, different, measurement, error, model, specifications, in, section, 5, we, provide, a, brief, summary
#> 21                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      these, questions, greatly, reduce, the, possibility, of, respondent, error, so, that, the, educational, attainment, values, in, the, nscg, can, be, considered, a, gold, standard, black, et, al, 2003, the, census, long, form, in, contrast, did, not, include, detailed, follow, up, questions, so, that, reported, educational, attainment, is, prone, to, measurement, error, the, census, bureau, linked, each, individual, in, the, nscg, to, their, corresponding, record, in, the, long, form, data
#> 22                                                                                                                                                                                                                                                                                                                                   census, reported, education, z, ba, ma, prof, phd, total, ba, 89580, 4109, 1241, 249, 95179, nscg, ma, 1218, 33928, 655, 526, 36327, reported, prof, 382, 359, 8648, 563, 9952, education, phd, 99, 193, 452, 6726, 7470, total, 91279, 38589, 10996, 8064, 148928, no, degree, 10150, 1792, 2040, 337, 14319, other, 33368, 10912, 4710, 2406, 51396, 1993, because, of, the, linkages, we, can, characterize, the, actual, measurement, error, mechanism, for, educational, attainment, in, the, 1990, long, form, data, in, the, nscg, we, treat, the, highest, degree, of, the, three, most, recent, degrees, reported, coded, as, ed6c1, ed6c2, and, ed6c3, in, the, file, as, the, true, education, level
#> 23                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       of, the, individuals, in, the, nscg, who, had, at, least, a, college, degree, at, the, time, of, the, 1990, census, about, 93.3, of, them, have, the, same, contemporaneous, education, levels, in, both, files, this, suggests, that, most, people, report, correctly, an, observation, we, want, to, leverage, when, constructing, measurement, error, models, for, education, in, the, 2010, acs, in, most, situations, we, do, not, have, the, good, fortune, of, observing, individuals, errorprone, and, true, values, simultaneously
#> 24                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                additionally, de, can, include, variables, for, which, there, is, no, corresponding, variable, in, dg, these, variables, do, not, play, a, role, in, the, measurement, error, modeling, although, they, can, be, used, in, multiple, imputation, inferences, we, seek, to, estimate, pr, y, z, x, and, use, it, to, create, multiple, imputations, for, the, missing, values, in, y, for, the, individuals, in, de
#> 25                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             for, individual, i, the, full, data, likelihood, omitting, parameters, for, simplicity, can, be, factored, as, pr, yi, k, zi, l, xi, pr, yi, k, xi, pr, ei, e, yi, k, xi, pr, zi, l, ei, e, yi, k, xi, 1, this, separates, the, true, data, generation, process, and, the, measurement, error, generation, process, which, facilitates, model, specification, in, particular, we, can, use, dg, to, estimate, the, true, data, distribution, pr, y, x
#> 26                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          without, linked, data, analysts, cannot, use, exploratory, data, analysis, to, inform, the, model, choice, instead, we, recommend, that, analysts, posit, scientifically, defensible, measurement, error, models, and, make, post, hoc, checks, of, the, sensibility, of, analyses, from, those, models, we, demonstrate, this, approach, in, section, 4
#> 27                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       this, is, akin, to, diagnostics, in, multiple, imputation, for, missing, data, that, compare, imputed, and, observed, values, abayomi, et, al, 2008, when, these, distributions, differ, substantially, it, suggests, the, measurement, error, model, specification, or, possibly, the, true, data, model, is, inadequate, such, diagnostic, checks, only, can, reveal, problems, with, the, model, specification, they, do, not, indicate, that, a, particular, specification, is, correct
#> 28                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  in, the, nscg, we, discarded, 38, records, with, race, suppressed, leaving, a, sample, size, of, ng, 77, 150, we, consider, two, sets, of, measurement, error, model, specifications, the, first, set, uses, specifications, like, those, in, section, 3, with, flat, prior, distributions, for, all, parameters
#> 29                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     the, first, set, uses, specifications, like, those, in, section, 3, with, flat, prior, distributions, for, all, parameters, we, use, this, set, to, illustrate, model, diagnostics, and, sensitivity, analysis, absent, prior, information, about, the, measurement, error, process, the, second, set, uses, a, common, error, and, reporting, model, with, different, informative, prior, distributions, on, its, parameters
#> 30                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 first, we, use, survey, weighted, inferences, to, estimate, population, totals, of, y, x, from, the, 2010, nscg, second, we, turn, these, estimates, into, an, approximate, bayesian, posterior, distribution, for, input, to, fitting, the, measurement, error, models, used, to, impute, plausible, values, of, yi, for, individuals, in, the, acs, we, now, describe, this, process, which, can, be, used, generally, when, dg, is, collected, via, a, complex, survey, design
#> 31                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    more, precisely, let, the, acs, design, based, estimator, for, tx, 16, table, 2, summary, of, the, first, four, measurement, error, model, specifications, for, 2010, nscg, acs, analysis, these, models, use, flat, prior, distributions, on, all, parameters
#> 32                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   we, include, an, example, of, this, entire, procedure, in, the, supplementary, material, 4.2, measurement, error, models, the, two, sets, of, measurement, error, models, include, four, that, use, flat, prior, distributions, and, three, that, use, informative, prior, distributions, based, on, the, 1993, linked, data, for, all, error, models, we, use, a, logistic, regression, of, ei, on, various, main, effects, and, interactions, of, yi, and, xi
#> 33                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               in, model, 4, the, error, and, reporting, models, both, depend, on, y, and, sex, for, models, 5, 7, we, use, the, specification, in, model, 4, and, incorporate, prior, information, about, the, measurement, errors, from, the, 1993, linked, data, in, constructing, the, priors, we, first, remove, records, that, have, been, flagged, as, having, missing, education, that, has, been, imputed, because, these, imputations, might, not, closely, reflect, the, actual, education, values, black, et, al, 2003
#> 34                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              19, 4.3, empirical, results, we, first, examine, what, each, model, suggests, about, the, extent, and, nature, of, the, measurement, errors, in, the, 2010, acs, we, then, use, the, models, to, assess, sensitivity, of, results, about, the, substantive, questions, related, to, number, of, degrees, and, income
#> 35                                                                                                                                                                                                                                                                                                                                                                                            we, provide, more, thorough, investigation, of, the, impact, of, the, prior, specifications, in, the, supplementary, material, of, course, we, cannot, be, certain, which, model, most, closely, reflects, the, true, measurement, error, mechanism, the, best, we, can, do, is, perform, diagnostic, tests, to, see, which, models, if, any, should, be, discounted, as, not, adequately, describing, the, observed, data, m, for, each, acs, imputed, dataset, de, under, each, model, we, compute, the, sample, pro, m, portions, π̂xk, and, corresponding, multiple, imputation, 95, confidence, intervals, for, all, 165̇, unique, values, of, x, y
#> 36                                                                                                                                                                                                                                                                                                                                                                                    it, seems, plausible, that, the, probability, of, misreporting, education, as, well, as, the, reported, value, itself, when, errors, are, made, depend, on, both, sex, and, true, education, level, additionally, the, prior, distribution, from, the, 1993, linked, data, pulls, estimates, in, groups, with, little, sample, size, to, measurement, error, distributions, that, seem, more, plausible, on, face, value, however, one, need, not, use, the, data, fusion, framework, for, measurement, error, to, select, a, single, model, rather, one, can, use, the, framework, to, examine, sensitivity, of, analyses, to, the, different, specifications
#> 37                                                                                                                                                                                                                                                                                                                                      additionally, the, prior, distribution, from, the, 1993, linked, data, pulls, estimates, in, groups, with, little, sample, size, to, measurement, error, distributions, that, seem, more, plausible, on, face, value, however, one, need, not, use, the, data, fusion, framework, for, measurement, error, to, select, a, single, model, rather, one, can, use, the, framework, to, examine, sensitivity, of, analyses, to, the, different, specifications, 4.3.2, sensitivity, analyses, figure, 2, displays, the, multiply, imputed, survey, weighted, inferences, for, the, total, number, of, women, with, science, and, engineering, degrees, computing, using, the, acs, specific, indicator, variable
#> 38                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  we, note, that, using, the, acs, reported, education, without, adjustments, results, in, substantially, higher, estimated, totals, at, the, professional, and, ph, d, levels, than, any, of, the, models, that, account, for, measurement, error, we, also, note, that, the, cia, model, yields, considerably, lower, counts, for, all, but, bachelor’s, degrees
#> 39                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           d, recipients, than, the, other, models, 5, concluding, remarks, the, framework, presented, in, this, article, offers, analysts, tools, for, using, the, information, in, a, high, quality, separate, data, source, to, adjust, for, measurement, errors, in, the, database, of, interest, key, to, the, framework, is, to, replace, conditional, independence, assumptions, typically, used, in, data, fusion, with, carefully, considered, measurement, error, models
#> 40                                                                                                                                                                                                                                                                                                                                                                                                                                                    5, concluding, remarks, the, framework, presented, in, this, article, offers, analysts, tools, for, using, the, information, in, a, high, quality, separate, data, source, to, adjust, for, measurement, errors, in, the, database, of, interest, key, to, the, framework, is, to, replace, conditional, independence, assumptions, typically, used, in, data, fusion, with, carefully, considered, measurement, error, models, this, avoids, sacrificing, information, and, facilitates, analysis, of, the, sensitivity, of, conclusions, to, alternative, measurement, error, specifications
#> 41                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          key, to, the, framework, is, to, replace, conditional, independence, assumptions, typically, used, in, data, fusion, with, carefully, considered, measurement, error, models, this, avoids, sacrificing, information, and, facilitates, analysis, of, the, sensitivity, of, conclusions, to, alternative, measurement, error, specifications, analysts, can, use, diagnostic, tests, to, rule, out, some, measurement, error, models, and, perform, sensibility, tests, on, others, to, identify, reasonable, candidates
#> 42                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    this, avoids, sacrificing, information, and, facilitates, analysis, of, the, sensitivity, of, conclusions, to, alternative, measurement, error, specifications, analysts, can, use, diagnostic, tests, to, rule, out, some, measurement, error, models, and, perform, sensibility, tests, on, others, to, identify, reasonable, candidates, 24
#> 43                                                                                                                                                                                                                                                                                                                                                                                                                                                                              the, acs, estimate, is, the, survey, weighted, estimate, based, on, the, reported, education, level, in, the, 2010, acs, besides, survey, sampling, contexts, like, the, one, considered, here, involving, the, acs, and, nscg, the, framework, offers, potential, approaches, for, dealing, with, possible, measurement, errors, in, organic, big, data, this, is, increasingly, important, as, data, stewards, and, analysts, consider, replacing, or, supplementing, high, quality, but, expensive, surveys, with, inexpensive, and, large, sample, organic, data
#> 44                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  this, is, increasingly, important, as, data, stewards, and, analysts, consider, replacing, or, supplementing, high, quality, but, expensive, surveys, with, inexpensive, and, large, sample, organic, data, often, scant, attention, is, paid, to, the, potential, impact, of, measurement, errors, on, inferences, from, those, data, the, framework, could, be, used, with, high, quality, validated, surveys, as, the, gold, standard, data, allowing, for, adjustments, to, the, error, prone, organic, data
#> 45                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              m, 2005, imputation, of, binary, treatment, variables, with, measurement, error, in, administrative, data, journal, of, the, american, statistical, association, 100, 1123, 1132, 29
       
# can also split pdfs
keyword_directory(directory, 
       keyword = c('repeated measures', 'measurement error'),
       split_pdf = TRUE, remove_hyphen = FALSE,
       surround_lines = 1, full_names = TRUE)
#>    ID       pdf_name           keyword page_num line_num
#> 1   1 1501.00450.pdf repeated measures        1        5
#> 2   1 1501.00450.pdf repeated measures        2       42
#> 3   1 1501.00450.pdf repeated measures        2       43
#> 4   1 1501.00450.pdf repeated measures        2       50
#> 5   1 1501.00450.pdf repeated measures        2       51
#> 6   1 1501.00450.pdf repeated measures        6      211
#> 7   1 1501.00450.pdf repeated measures        6      219
#> 8   1 1501.00450.pdf repeated measures        7      225
#> 9   1 1501.00450.pdf repeated measures        9      353
#> 10  2 1610.00147.pdf measurement error        1        2
#> 11  2 1610.00147.pdf measurement error        1       10
#> 12  2 1610.00147.pdf measurement error        1       12
#> 13  2 1610.00147.pdf measurement error        2       15
#> 14  2 1610.00147.pdf measurement error        2       17
#> 15  2 1610.00147.pdf measurement error        2       18
#> 16  2 1610.00147.pdf measurement error        3       34
#> 17  2 1610.00147.pdf measurement error        4       41
#> 18  2 1610.00147.pdf measurement error        4       42
#> 19  2 1610.00147.pdf measurement error        4       44
#> 20  2 1610.00147.pdf measurement error        4       50
#> 21  2 1610.00147.pdf measurement error        5       59
#> 22  2 1610.00147.pdf measurement error        6       75
#> 23  2 1610.00147.pdf measurement error        7       94
#> 24  2 1610.00147.pdf measurement error        8      106
#> 25  2 1610.00147.pdf measurement error       12      152
#> 26  2 1610.00147.pdf measurement error       12      157
#> 27  2 1610.00147.pdf measurement error       14      185
#> 28  2 1610.00147.pdf measurement error       14      187
#> 29  2 1610.00147.pdf measurement error       15      199
#> 30  2 1610.00147.pdf measurement error       17      219
#> 31  2 1610.00147.pdf measurement error       17      227
#> 32  2 1610.00147.pdf measurement error       18      239
#> 33  2 1610.00147.pdf measurement error       22      302
#> 34  2 1610.00147.pdf measurement error       22      303
#> 35  2 1610.00147.pdf measurement error       23      314
#> 36  2 1610.00147.pdf measurement error       24      328
#> 37  2 1610.00147.pdf measurement error       24      329
#> 38  2 1610.00147.pdf measurement error       24      330
#> 39  2 1610.00147.pdf measurement error       24      331
#> 40  2 1610.00147.pdf measurement error       25      337
#> 41  2 1610.00147.pdf measurement error       30      447
#>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             line_text
#> 1                                                                                                                                       This limits the number of candidate variations to be evaluated, and the speed new feature iterations. , We introduce more sophisticated experimental designs, specifi- cally the repeated measures design, including the crossover design and related variants, to increase KPI sensitivity with the same traffic size and duration of experiment. , In this pa- per we present FORME (Flexible Online Repeated Measures Experiment), a flexible and scalable framework for these de- signs. 
#> 2                                                                           In particular, in the pre-experiment stage, all users received the default feature C (control) and none received the new feature T (treatment). , Groups A/B Test CUPED Parallel Crossover Re-Randomized Table 1: Repeated Measures Designs In this paper we extend the idea further by employing the repeated measures design in different stages of treatment assignment. , The traditional A/B test can be analyzed us- ing the repeated measures analysis, reporting a “per week” treatment effect, as show in row 3 “parallel” design in ta- ble 1. 
#> 3                                                                   Groups A/B Test CUPED Parallel Crossover Re-Randomized Table 1: Repeated Measures Designs In this paper we extend the idea further by employing the repeated measures design in different stages of treatment assignment. , The traditional A/B test can be analyzed us- ing the repeated measures analysis, reporting a “per week” treatment effect, as show in row 3 “parallel” design in ta- ble 1. , The two week experiment can be considered to be conducted in two periods, even though users received the same treatment assignment during both periods. 
#> 4                                                                                                                                                                                                  This way each user serves as his/her own control in the measurement. , In fact, the crossover design is a type of repeated measures design commonly used in biomedical research to control for within-subject variation. , We also discuss practical consid- erations to repeated measures design, with variants to the crossover design to study the carry over effect, including the “re-randomized” design (row 5 in table 1). 
#> 5                                                                                                                                                       In fact, the crossover design is a type of repeated measures design commonly used in biomedical research to control for within-subject variation. , We also discuss practical consid- erations to repeated measures design, with variants to the crossover design to study the carry over effect, including the “re-randomized” design (row 5 in table 1). , Motivation In this paper, we propose a framework called FORME (Flex- ible Online Repeated Measures Experiment). 
#> 6                           5.  , FLEXIBLE AND SCALABLE REPEATED MEASURES ANALYSIS VIA FORME 5.1 Review of Existing Methods It is common to analyze data from repeated measures design with the repeated measures ANOVA model and the F-test, under certain assumptions, such as normality, sphericity (ho- mogeneity of variances in differences between each pair of within-subject values), equal time points between subjects, and no missing data. , Such assumptions in general do not hold for large-scale online experiments, where the assign- ment of users into different treatment group may not be completely balanced. 
#> 7                                                                                                                                                                                                                                                                                                       Random effect makes modeling within-subject variability possible. , In repeated measures data, users might appear in multiple periods, represented as multiple rows in the dataset. , As a result, rows of the dataset are not independent but Metrics Beyond Average “baseline” measurement is captured as a random effect. 
#> 8                                                                                                                                                                                                                                                                                                      To see this, each user’s periods of the measurement, user id, and any other covariate. , As an example, one possible model for repeated measures In our cases they are indicators of treatment assignment,, improves accuracy in the estimation of treatment effect, sim- ilar to the illustration we derived in Section 2.4. 
#> 9                                                                                                                                                                                                                                                                                                      Note the drastic reduction in variance for such metrics means the same feature can be tested with only 1/3 of the original traffic! , At the design stage, we face a few choices under the same framework of repeated measures design. , Experimenters should use domain knowledge and past experiments to inform the design. 
#> 10                                                                                                                                                                                                                                arXiv:1610.00147v1 [stat.ME] 1 Oct 2016 Data Fusion for Correcting Measurement Errors Tracy Schifeling, Jerome P. , Reiter, Maria DeYoreo∗ Abstract Often in surveys, key items are subject to measurement errors. , Given just the data, it can be difficult to determine the distribution of this error process, and hence to obtain accurate inferences that involve the error-prone variables. 
#> 11                                                                                                                                                                                                                                                                                                                            In doing so, we account for the informative sampling design used to select the National Survey of College Graduates. , We also present a process for assessing the sensitivity of various analyses to different choices for the measurement error models. , Supplemental material is available online. 
#> 12                                                                                                                                                                                                                                                                                                                                                                                                                        Supplemental material is available online. , KEY WORDS: fusion, imputation, measurement error, missing, survey. , This research was supported by The National Science Foundation under award SES-11-31897. 
#> 13                                                                                                                                                                           The authors wish to thank Seth Sanders for his input on informative prior specifications, and Mauricio Sadinle for discussion that improved the strategy for accounting for the informative sample design., Survey data often contain items that are subject to measurement errors. , For example, some respondents might misunderstand a question or accidentally select the wrong response, thereby providing values unequal to their factual values. 
#> 14                                                                                                                                                                                                                                         For example, some respondents might misunderstand a question or accidentally select the wrong response, thereby providing values unequal to their factual values. , Left uncorrected, these measurement errors can result in degraded inferences (Kim et al., 2015). , Unfor- tunately, the distribution of the measurement errors typically is not estimable from the survey data alone. 
#> 15                                                                                                                                                                                                          Left uncorrected, these measurement errors can result in degraded inferences (Kim et al., 2015). , Unfor- tunately, the distribution of the measurement errors typically is not estimable from the survey data alone. , One either needs to make strong assumptions about the measure- ment error process (e.g., as in Curran and Hussong, 2009), or leverage information from some other source of data, as we do here. 
#> 16          It does not make sense to alter every individual’s reported values in the survey, as would be done using a conditional independence approach. , In this article, we develop a framework for leveraging information from gold stan- dard data to improve inferences in surveys subject to measurement errors. , The basic idea is to encode plausible assumptions about the error process, e.g., most people do not make errors when reporting educational attainments, and the reporting process, e.g., when people make errors, they are more likely to report higher attainments than actual, into statistical models. 
#> 17                                                                                            example of misreporting of educational attainment in data collected by the Census Bureau, so as to motivate the methodological developments. , In Section 3, we intro- duce the general framework for specifying measurement error models to leverage the information in gold standard data. , In Section 4, we apply the framework to handle po- tential measurement error in educational attainment in the 2010 American Community Survey (ACS), using the 2010 National Survey of College Graduates (NSCG) as a gold standard file. 
#> 18                                                                                           In Section 3, we intro- duce the general framework for specifying measurement error models to leverage the information in gold standard data. , In Section 4, we apply the framework to handle po- tential measurement error in educational attainment in the 2010 American Community Survey (ACS), using the 2010 National Survey of College Graduates (NSCG) as a gold standard file. , In doing so, we deal with a key complication in the data integration: accounting for the informative sampling design used to sample the NSCG. 
#> 19                                                                                                                                                                                                                                                                                     In doing so, we deal with a key complication in the data integration: accounting for the informative sampling design used to sample the NSCG. , We also demonstrate how the framework facilitates analysis of the sensitivity of conclusions to different measurement error model specifications. , In Section 5, we provide a brief summary. 
#> 20                                                                                                                                                                           These questions greatly reduce the possibility of respondent error, so that the educational attainment values in the NSCG can be considered a gold standard (Black et al., 2003). , The census long form, in contrast, did not include detailed follow up questions, so that reported educational attainment is prone to measurement error. , The Census Bureau linked each individual in the NSCG to their corresponding record in the long form data. 
#> 21                                                                                                                                                                                                                                                                   NSCG- reported education  No Degree 1993). , Because of the linkages, we can characterize the actual measurement error mechanism for educational attainment in the 1990 long form data. , In the NSCG, we treat the highest degree of the three most recent degrees reported (coded as “ed6c1”, “ed6c2”, and “ed6c3” in the file) as the true education level. 
#> 22                                                                                                                                                   Of the individuals in the NSCG who had at least a college degree at the time of the 1990 census, about 93.3% of them have the same contemporaneous education levels in both files. , This suggests that most people report correctly, an observation we want to leverage when constructing measurement error models for education in the 2010 ACS. , In most situations, we do not have the good fortune of observing individuals’ error- prone and true values simultaneously. 
#> 23                                                                                                                                                                                                                                                            Additionally, DE can include variables for which there is no corresponding variable in DG . , These variables do not play a role in the measurement error modeling, although they can be used in multiple imputation inferences. , We seek to estimate Pr(Y, Z | X), and use it to create multiple imputations for the missing values in Y for the individuals in DE . 
#> 24                                                                                                                                                                                 Using E enables us to write Pr(Y, Z | X) as a product of three sub-models. , For individual i, the full data likelihood (omitting parameters for simplicity) can be factored as Pr(Yi = k, Zi = l | Xi ) = Pr(Yi = k | Xi ) This separates the true data generation process and the measurement error generation process, which facilitates model specification. , In particular, we can use DG to estimate the true data distribution Pr(Y | X). 
#> 25                                                                                                                                                                                                                                                                                                            Without linked data, analysts cannot use exploratory data analysis to inform the model choice. , Instead, we recommend that analysts posit scientifically defensible measurement error models, and make post-hoc checks of the sensibility of analyses from those models. , We demonstrate this approach in Section 4. 
#> 26                                                                                                                                                                                     This is akin to diagnostics in multiple imputation for missing data that compare imputed and observed values (Abayomi et al., 2008). , When these distributions differ substantially, it suggests the measurement error model specification (or possibly the true data model) is inadequate. , Such diagnostic checks only can reveal problems with the model specification; they do not indicate that a particular specification is correct. 
#> 27                                                                                                                                                                                                                                                                                                                                                In the NSCG, we discarded 38 records with race suppressed, leaving a sample size of nG = 77, 150. , We consider two sets of measurement error model specifications. , The first set uses specifications like those in Section 3, with flat prior distributions for all parameters. 
#> 28                                                                                                                                                                                                                                                 The first set uses specifications like those in Section 3, with flat prior distributions for all parameters. , We use this set to illustrate model diagnostics and sensitivity analysis absent prior information about the measurement error process. , The second set uses a common error and reporting model with different, informative prior distributions on its parameters. 
#> 29                                                                                                                                                                                                     First, we use survey-weighted inferences to estimate population totals of (Y | X) from the 2010 NSCG. , Second, we turn these estimates into an approximate Bayesian posterior distribution for input to fitting the measurement error models used to impute plausible values of Yi for individuals in the ACS. , We now describe this process, which can be used generally when DG is collected via a complex survey design. 
#> 30                                                                                                                                                                                                                                                                                                                                                                                                                                                         (8), Table 2: Summary of the first four measurement error model specifications for 2010 NSCG/ACS analysis. , These models use flat prior distributions on all parameters. 
#> 31                                                                                                                                                                                                                                                      We include an example of this entire procedure in the supplementary material. , The two sets of measurement error models include four that use flat prior distributions and three that use informative prior distributions based on the 1993 linked data. , For all error models, we use a logistic regression of Ei on various main effects and interactions of Yi and Xi . 
#> 32                                                                                                                                                                    In Model 4, the error and reporting models both depend on Y and sex. , For Models 5 – 7, we use the specification in Model 4 and incorporate prior in- formation about the measurement errors from the 1993 linked data. , In constructing the priors, we first remove records that have been flagged as having missing education that has been imputed, because these imputations might not closely reflect the actual education values (Black et al., 2003). 
#> 33                                              It seems plausible that the probability of misreporting education, as well as the reported value itself when errors are made, depend on both sex and true education level. , Additionally, the prior distribution from the 1993 linked data pulls estimates in groups with little sample size to measurement error distributions that seem more plausible on face value. , However, one need not use the data fusion framework for measurement error to select a single model; rather, one can use the framework to examine sensitivity of analyses to the different specifications. 
#> 34                       Additionally, the prior distribution from the 1993 linked data pulls estimates in groups with little sample size to measurement error distributions that seem more plausible on face value. , However, one need not use the data fusion framework for measurement error to select a single model; rather, one can use the framework to examine sensitivity of analyses to the different specifications. , 4.3.2 Figure 2 displays the multiply-imputed, survey-weighted inferences for the total number of women with science and engineering degrees, computing using the ACS-specific indicator variable. 
#> 35 We note that using the ACS-reported education without adjustments results in substantially higher estimated totals at the professional and Ph. , D. levels than any of the models that account for measurement error. , We also note that the CIA model yields considerably lower counts for all but bachelor’s degrees. degrees, the point estimates for Models 4 – 7 are reasonably close, with Models 4 x 10 x 10 CIA model model 4 Estimated total no. of sci. and eng. degrees model 6 model 7 awarded to women Model x 10 awarded to women Model Figure 3 displays inferences for the average income for different degrees. 
#> 36                                                                                                                                                                                                                                  D. recipients than the other models. , The framework presented in this article offers analysts tools for using the information in a high quality, separate data source to adjust for measurement errors in the database of interest. , Key to the framework is to replace conditional independence assumptions typically used in data fusion with carefully considered measurement error models. 
#> 37                                                                                                                       The framework presented in this article offers analysts tools for using the information in a high quality, separate data source to adjust for measurement errors in the database of interest. , Key to the framework is to replace conditional independence assumptions typically used in data fusion with carefully considered measurement error models. , This avoids sacrificing information and facilitates analysis of the sensitivity of conclusions to alternative measurement error specifications. 
#> 38                                                                                                                                                             Key to the framework is to replace conditional independence assumptions typically used in data fusion with carefully considered measurement error models. , This avoids sacrificing information and facilitates analysis of the sensitivity of conclusions to alternative measurement error specifications. , Analysts can use diagnostic tests to rule out some measurement error models, and perform sensibility tests on others to identify reasonable candidates. 
#> 39                                                                                                                                                                                                                                                                                                     This avoids sacrificing information and facilitates analysis of the sensitivity of conclusions to alternative measurement error specifications. , Analysts can use diagnostic tests to rule out some measurement error models, and perform sensibility tests on others to identify reasonable candidates. , Concluding Remarks
#> 40                                                                                                This is increasingly important, as data stewards and analysts consider replacing or supplementing high quality but expensive surveys with inexpensive and large-sample organic data. , Often, scant attention is paid to the potential impact of measurement errors on inferences from those data. , The framework could be used with high quality, validated surveys as the gold standard data, allowing for adjustments to the error-prone organic data. x 10 CIA model model 4 model 5 model 6 model 7 Education level Prof none
#> 41                                                                                                                                                                                                                                                                                                                                                                                       M. , (2005), “Imputation of binary treatment variables with measurement error in administrative data,” Journal of the American Statistical Association, 100, 1123–1132., Table 4: Error rate estimates from different model specifications. 
#>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  token_text
#> 1                                                                                                                                                                           this, limits, the, number, of, candidate, variations, to, be, evaluated, and, the, speed, new, feature, iterations, we, introduce, more, sophisticated, experimental, designs, specifi, cally, the, repeated, measures, design, including, the, crossover, design, and, related, variants, to, increase, kpi, sensitivity, with, the, same, traffic, size, and, duration, of, experiment, in, this, pa, per, we, present, forme, flexible, online, repeated, measures, experiment, a, flexible, and, scalable, framework, for, these, de, signs
#> 2                                                                                                 in, particular, in, the, pre, experiment, stage, all, users, received, the, default, feature, c, control, and, none, received, the, new, feature, t, treatment, groups, a, b, test, cuped, parallel, crossover, re, randomized, table, 1, repeated, measures, designs, in, this, paper, we, extend, the, idea, further, by, employing, the, repeated, measures, design, in, different, stages, of, treatment, assignment, the, traditional, a, b, test, can, be, analyzed, us, ing, the, repeated, measures, analysis, reporting, a, per, week, treatment, effect, as, show, in, row, 3, parallel, design, in, ta, ble, 1
#> 3                                                                                   groups, a, b, test, cuped, parallel, crossover, re, randomized, table, 1, repeated, measures, designs, in, this, paper, we, extend, the, idea, further, by, employing, the, repeated, measures, design, in, different, stages, of, treatment, assignment, the, traditional, a, b, test, can, be, analyzed, us, ing, the, repeated, measures, analysis, reporting, a, per, week, treatment, effect, as, show, in, row, 3, parallel, design, in, ta, ble, 1, the, two, week, experiment, can, be, considered, to, be, conducted, in, two, periods, even, though, users, received, the, same, treatment, assignment, during, both, periods
#> 4                                                                                                                                                                                                                                     this, way, each, user, serves, as, his, her, own, control, in, the, measurement, in, fact, the, crossover, design, is, a, type, of, repeated, measures, design, commonly, used, in, biomedical, research, to, control, for, within, subject, variation, we, also, discuss, practical, consid, erations, to, repeated, measures, design, with, variants, to, the, crossover, design, to, study, the, carry, over, effect, including, the, re, randomized, design, row, 5, in, table, 1
#> 5                                                                                                                                                                                           in, fact, the, crossover, design, is, a, type, of, repeated, measures, design, commonly, used, in, biomedical, research, to, control, for, within, subject, variation, we, also, discuss, practical, consid, erations, to, repeated, measures, design, with, variants, to, the, crossover, design, to, study, the, carry, over, effect, including, the, re, randomized, design, row, 5, in, table, 1, motivation, in, this, paper, we, propose, a, framework, called, forme, flex, ible, online, repeated, measures, experiment
#> 6                                           5, flexible, and, scalable, repeated, measures, analysis, via, forme, 5.1, review, of, existing, methods, it, is, common, to, analyze, data, from, repeated, measures, design, with, the, repeated, measures, anova, model, and, the, f, test, under, certain, assumptions, such, as, normality, sphericity, ho, mogeneity, of, variances, in, differences, between, each, pair, of, within, subject, values, equal, time, points, between, subjects, and, no, missing, data, such, assumptions, in, general, do, not, hold, for, large, scale, online, experiments, where, the, assign, ment, of, users, into, different, treatment, group, may, not, be, completely, balanced
#> 7                                                                                                                                                                                                                                                                                                                                                            random, effect, makes, modeling, within, subject, variability, possible, in, repeated, measures, data, users, might, appear, in, multiple, periods, represented, as, multiple, rows, in, the, dataset, as, a, result, rows, of, the, dataset, are, not, independent, but, metrics, beyond, average, baseline, measurement, is, captured, as, a, random, effect
#> 8                                                                                                                                                                                                                                                                                                                                                       to, see, this, each, user’s, periods, of, the, measurement, user, id, and, any, other, covariate, as, an, example, one, possible, model, for, repeated, measures, in, our, cases, they, are, indicators, of, treatment, assignment, improves, accuracy, in, the, estimation, of, treatment, effect, sim, ilar, to, the, illustration, we, derived, in, section, 2.4
#> 9                                                                                                                                                                                                                                                                                                                                                 note, the, drastic, reduction, in, variance, for, such, metrics, means, the, same, feature, can, be, tested, with, only, 1, 3, of, the, original, traffic, at, the, design, stage, we, face, a, few, choices, under, the, same, framework, of, repeated, measures, design, experimenters, should, use, domain, knowledge, and, past, experiments, to, inform, the, design
#> 10                                                                                                                                                                                                                                                                             arxiv, 1610.00147v1, stat.me, 1, oct, 2016, data, fusion, for, correcting, measurement, errors, tracy, schifeling, jerome, p, reiter, maria, deyoreo, abstract, often, in, surveys, key, items, are, subject, to, measurement, errors, given, just, the, data, it, can, be, difficult, to, determine, the, distribution, of, this, error, process, and, hence, to, obtain, accurate, inferences, that, involve, the, error, prone, variables
#> 11                                                                                                                                                                                                                                                                                                                                                                                in, doing, so, we, account, for, the, informative, sampling, design, used, to, select, the, national, survey, of, college, graduates, we, also, present, a, process, for, assessing, the, sensitivity, of, various, analyses, to, different, choices, for, the, measurement, error, models, supplemental, material, is, available, online
#> 12                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 supplemental, material, is, available, online, key, words, fusion, imputation, measurement, error, missing, survey, this, research, was, supported, by, the, national, science, foundation, under, award, ses, 11, 31897
#> 13                                                                                                                                                                                                             the, authors, wish, to, thank, seth, sanders, for, his, input, on, informative, prior, specifications, and, mauricio, sadinle, for, discussion, that, improved, the, strategy, for, accounting, for, the, informative, sample, design, survey, data, often, contain, items, that, are, subject, to, measurement, errors, for, example, some, respondents, might, misunderstand, a, question, or, accidentally, select, the, wrong, response, thereby, providing, values, unequal, to, their, factual, values
#> 14                                                                                                                                                                                                                                                                                            for, example, some, respondents, might, misunderstand, a, question, or, accidentally, select, the, wrong, response, thereby, providing, values, unequal, to, their, factual, values, left, uncorrected, these, measurement, errors, can, result, in, degraded, inferences, kim, et, al, 2015, unfor, tunately, the, distribution, of, the, measurement, errors, typically, is, not, estimable, from, the, survey, data, alone
#> 15                                                                                                                                                                                                                                                        left, uncorrected, these, measurement, errors, can, result, in, degraded, inferences, kim, et, al, 2015, unfor, tunately, the, distribution, of, the, measurement, errors, typically, is, not, estimable, from, the, survey, data, alone, one, either, needs, to, make, strong, assumptions, about, the, measure, ment, error, process, e.g, as, in, curran, and, hussong, 2009, or, leverage, information, from, some, other, source, of, data, as, we, do, here
#> 16                         it, does, not, make, sense, to, alter, every, individual’s, reported, values, in, the, survey, as, would, be, done, using, a, conditional, independence, approach, in, this, article, we, develop, a, framework, for, leveraging, information, from, gold, stan, dard, data, to, improve, inferences, in, surveys, subject, to, measurement, errors, the, basic, idea, is, to, encode, plausible, assumptions, about, the, error, process, e.g, most, people, do, not, make, errors, when, reporting, educational, attainments, and, the, reporting, process, e.g, when, people, make, errors, they, are, more, likely, to, report, higher, attainments, than, actual, into, statistical, models
#> 17                                                                                                                      example, of, misreporting, of, educational, attainment, in, data, collected, by, the, census, bureau, so, as, to, motivate, the, methodological, developments, in, section, 3, we, intro, duce, the, general, framework, for, specifying, measurement, error, models, to, leverage, the, information, in, gold, standard, data, in, section, 4, we, apply, the, framework, to, handle, po, tential, measurement, error, in, educational, attainment, in, the, 2010, american, community, survey, acs, using, the, 2010, national, survey, of, college, graduates, nscg, as, a, gold, standard, file
#> 18                                                                                                                  in, section, 3, we, intro, duce, the, general, framework, for, specifying, measurement, error, models, to, leverage, the, information, in, gold, standard, data, in, section, 4, we, apply, the, framework, to, handle, po, tential, measurement, error, in, educational, attainment, in, the, 2010, american, community, survey, acs, using, the, 2010, national, survey, of, college, graduates, nscg, as, a, gold, standard, file, in, doing, so, we, deal, with, a, key, complication, in, the, data, integration, accounting, for, the, informative, sampling, design, used, to, sample, the, nscg
#> 19                                                                                                                                                                                                                                                                                                                                    in, doing, so, we, deal, with, a, key, complication, in, the, data, integration, accounting, for, the, informative, sampling, design, used, to, sample, the, nscg, we, also, demonstrate, how, the, framework, facilitates, analysis, of, the, sensitivity, of, conclusions, to, different, measurement, error, model, specifications, in, section, 5, we, provide, a, brief, summary
#> 20                                                                                                                                                                                                             these, questions, greatly, reduce, the, possibility, of, respondent, error, so, that, the, educational, attainment, values, in, the, nscg, can, be, considered, a, gold, standard, black, et, al, 2003, the, census, long, form, in, contrast, did, not, include, detailed, follow, up, questions, so, that, reported, educational, attainment, is, prone, to, measurement, error, the, census, bureau, linked, each, individual, in, the, nscg, to, their, corresponding, record, in, the, long, form, data
#> 21                                                                                                                                                                                                                                                                                                                          nscg, reported, education, no, degree, 1993, because, of, the, linkages, we, can, characterize, the, actual, measurement, error, mechanism, for, educational, attainment, in, the, 1990, long, form, data, in, the, nscg, we, treat, the, highest, degree, of, the, three, most, recent, degrees, reported, coded, as, ed6c1, ed6c2, and, ed6c3, in, the, file, as, the, true, education, level
#> 22                                                                                                                                                                            of, the, individuals, in, the, nscg, who, had, at, least, a, college, degree, at, the, time, of, the, 1990, census, about, 93.3, of, them, have, the, same, contemporaneous, education, levels, in, both, files, this, suggests, that, most, people, report, correctly, an, observation, we, want, to, leverage, when, constructing, measurement, error, models, for, education, in, the, 2010, acs, in, most, situations, we, do, not, have, the, good, fortune, of, observing, individuals, error, prone, and, true, values, simultaneously
#> 23                                                                                                                                                                                                                                                                                                       additionally, de, can, include, variables, for, which, there, is, no, corresponding, variable, in, dg, these, variables, do, not, play, a, role, in, the, measurement, error, modeling, although, they, can, be, used, in, multiple, imputation, inferences, we, seek, to, estimate, pr, y, z, x, and, use, it, to, create, multiple, imputations, for, the, missing, values, in, y, for, the, individuals, in, de
#> 24                                                                                                                                                                                                                                   using, e, enables, us, to, write, pr, y, z, x, as, a, product, of, three, sub, models, for, individual, i, the, full, data, likelihood, omitting, parameters, for, simplicity, can, be, factored, as, pr, yi, k, zi, l, xi, pr, yi, k, xi, this, separates, the, true, data, generation, process, and, the, measurement, error, generation, process, which, facilitates, model, specification, in, particular, we, can, use, dg, to, estimate, the, true, data, distribution, pr, y, x
#> 25                                                                                                                                                                                                                                                                                                                                                                 without, linked, data, analysts, cannot, use, exploratory, data, analysis, to, inform, the, model, choice, instead, we, recommend, that, analysts, posit, scientifically, defensible, measurement, error, models, and, make, post, hoc, checks, of, the, sensibility, of, analyses, from, those, models, we, demonstrate, this, approach, in, section, 4
#> 26                                                                                                                                                                                                                              this, is, akin, to, diagnostics, in, multiple, imputation, for, missing, data, that, compare, imputed, and, observed, values, abayomi, et, al, 2008, when, these, distributions, differ, substantially, it, suggests, the, measurement, error, model, specification, or, possibly, the, true, data, model, is, inadequate, such, diagnostic, checks, only, can, reveal, problems, with, the, model, specification, they, do, not, indicate, that, a, particular, specification, is, correct
#> 27                                                                                                                                                                                                                                                                                                                                                                                                         in, the, nscg, we, discarded, 38, records, with, race, suppressed, leaving, a, sample, size, of, ng, 77, 150, we, consider, two, sets, of, measurement, error, model, specifications, the, first, set, uses, specifications, like, those, in, section, 3, with, flat, prior, distributions, for, all, parameters
#> 28                                                                                                                                                                                                                                                                                            the, first, set, uses, specifications, like, those, in, section, 3, with, flat, prior, distributions, for, all, parameters, we, use, this, set, to, illustrate, model, diagnostics, and, sensitivity, analysis, absent, prior, information, about, the, measurement, error, process, the, second, set, uses, a, common, error, and, reporting, model, with, different, informative, prior, distributions, on, its, parameters
#> 29                                                                                                                                                                                                                                        first, we, use, survey, weighted, inferences, to, estimate, population, totals, of, y, x, from, the, 2010, nscg, second, we, turn, these, estimates, into, an, approximate, bayesian, posterior, distribution, for, input, to, fitting, the, measurement, error, models, used, to, impute, plausible, values, of, yi, for, individuals, in, the, acs, we, now, describe, this, process, which, can, be, used, generally, when, dg, is, collected, via, a, complex, survey, design
#> 30                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               8, table, 2, summary, of, the, first, four, measurement, error, model, specifications, for, 2010, nscg, acs, analysis, these, models, use, flat, prior, distributions, on, all, parameters
#> 31                                                                                                                                                                                                                                                                                           we, include, an, example, of, this, entire, procedure, in, the, supplementary, material, the, two, sets, of, measurement, error, models, include, four, that, use, flat, prior, distributions, and, three, that, use, informative, prior, distributions, based, on, the, 1993, linked, data, for, all, error, models, we, use, a, logistic, regression, of, ei, on, various, main, effects, and, interactions, of, yi, and, xi
#> 32                                                                                                                                                                                                    in, model, 4, the, error, and, reporting, models, both, depend, on, y, and, sex, for, models, 5, 7, we, use, the, specification, in, model, 4, and, incorporate, prior, in, formation, about, the, measurement, errors, from, the, 1993, linked, data, in, constructing, the, priors, we, first, remove, records, that, have, been, flagged, as, having, missing, education, that, has, been, imputed, because, these, imputations, might, not, closely, reflect, the, actual, education, values, black, et, al, 2003
#> 33                                                           it, seems, plausible, that, the, probability, of, misreporting, education, as, well, as, the, reported, value, itself, when, errors, are, made, depend, on, both, sex, and, true, education, level, additionally, the, prior, distribution, from, the, 1993, linked, data, pulls, estimates, in, groups, with, little, sample, size, to, measurement, error, distributions, that, seem, more, plausible, on, face, value, however, one, need, not, use, the, data, fusion, framework, for, measurement, error, to, select, a, single, model, rather, one, can, use, the, framework, to, examine, sensitivity, of, analyses, to, the, different, specifications
#> 34                                    additionally, the, prior, distribution, from, the, 1993, linked, data, pulls, estimates, in, groups, with, little, sample, size, to, measurement, error, distributions, that, seem, more, plausible, on, face, value, however, one, need, not, use, the, data, fusion, framework, for, measurement, error, to, select, a, single, model, rather, one, can, use, the, framework, to, examine, sensitivity, of, analyses, to, the, different, specifications, 4.3.2, figure, 2, displays, the, multiply, imputed, survey, weighted, inferences, for, the, total, number, of, women, with, science, and, engineering, degrees, computing, using, the, acs, specific, indicator, variable
#> 35 we, note, that, using, the, acs, reported, education, without, adjustments, results, in, substantially, higher, estimated, totals, at, the, professional, and, ph, d, levels, than, any, of, the, models, that, account, for, measurement, error, we, also, note, that, the, cia, model, yields, considerably, lower, counts, for, all, but, bachelor’s, degrees, degrees, the, point, estimates, for, models, 4, 7, are, reasonably, close, with, models, 4, x, 10, x, 10, cia, model, model, 4, estimated, total, no, of, sci, and, eng, degrees, model, 6, model, 7, awarded, to, women, model, x, 10, awarded, to, women, model, figure, 3, displays, inferences, for, the, average, income, for, different, degrees
#> 36                                                                                                                                                                                                                                                                          d, recipients, than, the, other, models, the, framework, presented, in, this, article, offers, analysts, tools, for, using, the, information, in, a, high, quality, separate, data, source, to, adjust, for, measurement, errors, in, the, database, of, interest, key, to, the, framework, is, to, replace, conditional, independence, assumptions, typically, used, in, data, fusion, with, carefully, considered, measurement, error, models
#> 37                                                                                                                                                   the, framework, presented, in, this, article, offers, analysts, tools, for, using, the, information, in, a, high, quality, separate, data, source, to, adjust, for, measurement, errors, in, the, database, of, interest, key, to, the, framework, is, to, replace, conditional, independence, assumptions, typically, used, in, data, fusion, with, carefully, considered, measurement, error, models, this, avoids, sacrificing, information, and, facilitates, analysis, of, the, sensitivity, of, conclusions, to, alternative, measurement, error, specifications
#> 38                                                                                                                                                                                                 key, to, the, framework, is, to, replace, conditional, independence, assumptions, typically, used, in, data, fusion, with, carefully, considered, measurement, error, models, this, avoids, sacrificing, information, and, facilitates, analysis, of, the, sensitivity, of, conclusions, to, alternative, measurement, error, specifications, analysts, can, use, diagnostic, tests, to, rule, out, some, measurement, error, models, and, perform, sensibility, tests, on, others, to, identify, reasonable, candidates
#> 39                                                                                                                                                                                                                                                                                                                                                          this, avoids, sacrificing, information, and, facilitates, analysis, of, the, sensitivity, of, conclusions, to, alternative, measurement, error, specifications, analysts, can, use, diagnostic, tests, to, rule, out, some, measurement, error, models, and, perform, sensibility, tests, on, others, to, identify, reasonable, candidates, concluding, remarks
#> 40                                                                                                                this, is, increasingly, important, as, data, stewards, and, analysts, consider, replacing, or, supplementing, high, quality, but, expensive, surveys, with, inexpensive, and, large, sample, organic, data, often, scant, attention, is, paid, to, the, potential, impact, of, measurement, errors, on, inferences, from, those, data, the, framework, could, be, used, with, high, quality, validated, surveys, as, the, gold, standard, data, allowing, for, adjustments, to, the, error, prone, organic, data, x, 10, cia, model, model, 4, model, 5, model, 6, model, 7, education, level, prof, none
#> 41                                                                                                                                                                                                                                                                                                                                                                                                                                                               m, 2005, imputation, of, binary, treatment, variables, with, measurement, error, in, administrative, data, journal, of, the, american, statistical, association, 100, 1123, 1132, table, 4, error, rate, estimates, from, different, model, specifications