Em
minha pesquisa de tese, graças, a priori, à falta de eficiência dos tribunais
de contas, tenho muitos dados faltantes (missing data) em meu banco de dados,
gerado a partir de indicadores fiscais, orçamentários e de transparência de 282
municípios brasileiros (todos com população superior a 100 mil habitantes).
Para
clarear as ideias e lidar melhor com esse problema, estou lendo o livro Missing data: a gentle introduction, dos
autores Patrick E. McKnight; Katherine M. McKnight; Souraya Sidani; Aurelio
José Figueredo, publicado pela editora Guilford Press, 2007 (New York).
Encontrei
algumas citações interessantes no livro, que gostaria de partilhar no blog. Inicialmente pensei em fazer uma tradução livre, mas por falta de tempo e para oferecer aos leitores o
texto original, segue!
"As the old
saying goes, the only certainties are death and taxes. We'd like to add one
more to that list: missing data. As any social scientist can attest, missing
data are virtually guaranteed in research studies."
“The best
solution to handle missing data is to have none.” (R. A. Fisher)
No
Prefácio do livro, os autores agradecem a um colega chamado Lee Sechrest, que
colaborou com sugestões valiosas para o livro. Ainda de acordo com os autores, Lee
propôs quatro “leis” que foram recentemente publicados por Bradley Smith (2006)
e essas influenciaram sua forma de pensar, de forma que decidiram apresentá-las
no livro. Eis!
Lee’s Law Number
1: Everything eventually turns up.
This law says that
missing data often are not missing. If the data are simply misplaced on our
desk, a thorough search may very well turn up data we believed to be missing.
Lee’s Law Number
2: More is better than less.
We view this law
not from the perspective of missing data but from the perspective of data. If
we have more actual data, that can offset missing data. More missing data, however,
we would not want!
Lee’s Law Number
3: Rarely is there such a thing as a true catastrophe.
Unless all
information is lost, most missing data situations can either be prevented or
treated. Catastrophic data loss is rare enough that we do not need to prepare
ourselves for that outcome.
Lee’s Law Number
4: Nothing in statistics or research methodology requires you to do something
stupid.
This last law is
our guiding principle. Thoughtful approaches should always be preferred. No
procedure warrants using unsound data-handling methods. People who have
misused or performed contraindicated analyses should own up to the mistakes and
not blame an approach. At the heart of it all, we embrace a philosophy where
researchers calmly approach missing data as an interesting question rather than
as a catastrophe. The researcher acts as a detective, “solving” the problem
without compromising the integrity of the study.