Sample Paper on Arbitrary Metrics and Questionable Questions

Arbitrary Metrics and Questionable Questions

Article One


In their article titled, “Arbitrary Metrics in Psychology”, Blanton and Jaccard (2006) have undertaken an in-depth exploration of the arbitrary nature of metrics, especially in a case where they require drawing inferences regarding the absolute or true standing of the psychological dimension under study. However, the underlying issues that make arbitrary metrics more arbitrary can be lessened through the use of various strategies that Blanton and Jaccard (2006) have enumerated.

Arguments in support of thesis

While metric arbitrariness may not be a big issue when dealing with theory development or theory testing, they are, nonetheless, significant in terms of applied work. This is especially the case with for example, a psychologist wishing to diagnose the absolute standing of an individual on a given dimension. It is also crucial in case one wishes to assess the importance or magnitude of change. In order to reduce arbitrariness, Blanton and Jaccard (2006) argue that test developers should always endeavor to establish a robust emphirical base that correlates “test scores to meaningful events and that define cutoff or threshold values that imply significantly heightened risks or benefits” (p. 38).

Implications for test validity of the positions in the article

Blanton and Jaccard (2006) are of the opinion that while studies dedicated to examining the validity of a given scale could prove useful in giving meaning to a metric, on the other hand, the arbitrary nature of metrics and the related issues are unique from those of validity and reliability. Incase there is no association of a score to    certain external referents, the score remains largely arbitrary. However, making such an association helps to reduce this arbitrariness. As a result, the reduced arbitrariness increases one’s confidence “in the validity of the measure” (Blanton & Jaccard 2006). In contrast, when such an association is lacking, the confidence in the validity is also reduced. However, it is interesting to note that without these external referents, the values of the scores in question would be meaningless.


Blanton and Jaccard (2006) advise psychologists who are intent on reducing the arbitrariness of a given measure to make use of one or more of the following solutions. Firstly, they need to identify the relevant events that they perceive as being meaningful. Secondly, they need to develop a case for the meaningful events that have been identified, coupled with a positioning of these events in regards to the underlying psychological issues. Thirdly, Blanton and Jaccard (2006) urge members of the applied or scientific community to build consensus amongst themselves regarding the positioning. There is need to undertake the necessary research that will enable the researchers to link “ test scores to those events in such a way as to render the metric of the test meaningful” (Blanton & Jaccard, 2006, p. 38). Finally, Blanton and Jaccard (2006) recommend that psychologists should make a case and build consensus in a bid to ensure that the threshold value used in the arbitrary metrics make diagnostic statements

Implications of the problem

While the proposed solutions give meaning to a metric, on the other hand, both the validity of the scale and the meaning of the metric to the scale in question are conceptually bound. From a technical point of view, reliability and validity are properties of data, as opposed to properties of a scale. Therefore, the level of validity of a given measure will be determined by several aspects, including the type of instrument chosen for data collection, time of data collection, the sample, and the setting of data collection. This is a limitation in quantitative research as the aforementioned factors are a hindrance to the meaning of a metric. This calls for the inclusion of generalizability studies as a means of delivering the conditions and populations “to which a metrics meaning extends” (Blanton & Jaccard, 2006, p. 38).

Article Two


In their article titled, “Asking questions about behavior: cognition, communication and Questionnaire Construction”, Schwarz and Oyserman (2001) argue that self-reports are widely used by evaluation researchers to determine the behavior of participants at the various phases of an evaluation project. This is because they have proven to be cost-effective, in addition to the fact that they can also be used to observe infrequent behavior. On the other hand, self-reports have been reported to be a “highly unreliable source of data” (Schwarz & Oyserman, 2001).

Arguments for the thesis

As noted earlier by Schwarz and Oyserman (2002), self-reports are very unreliable source of data as the question format, context, and wording can have a profound influence on participants’ reports. Simple behavioral questions as used in self-report have the potential to pose complex cognitive tests. In addition, Schwarz and Oyserman (2001) argue that considering how highly dependent self-reports are on context, the ensuing results are likely to be impacted by minor changes in the question format, order, or wording. Therefore, the manner in which evaluators pose question could have a huge impact on the answers obtained. This goes against the maxim of manner rule, which argues, on the need to have clarity in the speakers’ contribution, as opposed to being ambiguous, wordy, or obscure. This can be dangerous in research because participants are likely to assume that the researcher’s choice of wording was meant to enable participants to understand what he meant.

Implications for test validity of the position of the article

The authors of this article have enumerated several implications of self reports on the methodology adopted. To begin with, the alternatives used in numeric response affect how respondents interpret the question and what it refers to. As such, the same question stem, coupled with various frequency options, could lead to an evaluation of differentially extreme behaviors. In addition, the results of behavioral reports are likely to being influenced by the use of frequency classes by the respondents as a point of reference. This makes it hard to compare different studies as even reports that investigate similar behavior on varying scales cannot be compared. There is also the risk of the obtained reports being homogenized by the frequency scales, while respondents also find it hard to recall “relevant episodes from memory” (Schwarz & Oyserman, 2006), p. 148). This means that the response alternatives for the behavior being studied could influence respondents with poor memory more than those with better memory. Finally, subsequent non-comparative and comparative judgments could be influenced by the range of response options.



To overcome the aforementioned problems on the reliability of self reports in data collection by evaluation researchers, Schwarz and Oyserman (2001) have proposed a number of solutions that could be adopted. They advise the evaluation researchers to first answer all questions by themselves. In case they encounter any difficulties in answering a certain question, there is a high probability that the respondents will also not answer it. They further recommend that evaluation researchers ask the meaning that respondents are likely to get from such attributes of the questionnaires as the reference period, response alternatives, title of questionnaire, and study’s sponsor. The questions should also be pilot tested with cognitive interviewing techniques, in addition to encouraging respondents to give accurate answers.

Implications of the problem and proposed solutions to quantitative research questions

By answering research questions first, before administering them to the respondents, the evaluation researchers will have reduced the ambiguity and lack of clarity in the questions. This will increase the reliability and validity of the study’s results. Secondly, pilot testing the questionnaire will also help to reduce errors in the questions, thereby enhancing its responsiveness. Again, this will go along way in increasing the test validity of the quantitative research questions.


Blanton, H., & Jaccard, J (2006). Arbitrary Metrics in Psychology. American Psychologist,

61(1), 27-41.

Schwarz, N., & Oyserman, D (2001). Asking Questions About Behavior: Cognition,

Communication and Questionnaire Construction. American Journal of Evaluation,     22,127-162.