Friday, 1 March 2013

TEST RELIABILITY



TEST RELIABILITY
Reliability
In statistics, reliability is the consistency of a measuring instrument. This can either be whether the instrument gives or are likely to give the same measurement or in the case of more subjective instruments, whether two independent assessors give similar scores.
Reliability is the extent to which a test is repeatable and yields consistent scores. In order to be valid, a test must be reliable; but reliability does not guarantee validity. All measurement procedures have the potential for error, so the aim is to minimize it. The goal of estimating reliability (consistency) is to determine how much of the variability in test scores is due to measurement error and how much is due to variability in true scores.
For example: If a person takes a personality assessment today and scores high in a trait like dominance, and we retest that same person after six weeks, if the person again scores high in dominance than we can say that the test is reliable.  If however, the individual scored low in dominance, we would have to conclude that the measure was inaccurate and unreliable
Reliability can be improved by getting repeated measurements using the same test and getting many different measures using slightly different techniques and methods - e.g. We would not consider one multiple-choice exam question to be a reliable basis for testing your knowledge of "individual differences".  Many questions are asked in many different formats (e.g., exam, essay, presentation) to help provide a more reliable score.

Validity
Validity is the extent to which a test measures what it is supposed to measure.  Validity is a subjective judgment made on the basis of experience and empirical indicators. Validity asks "Is the test measuring what you think it’s measuring?” For example, we might define "violence" as an act intended to cause harm to another person (a conceptual definition) but the operational definition might be seeing:
  • how many times a child hits a doll
  • how  often a child pushes to the front of the queue
  • How many physical scraps he/she gets into in the playground.
Are these valid measures of aggression?  i.e., how well does the operational definition match the conceptual definition?
Difference between Reliability and Validity

Validity and reliability are two terms that go hand in hand in any forms of testing.  It is important to understand the difference between the two terms.  Validity basically refers to research that provides evidence that a test actually measures what it is supposed to measure.  Reliability is more of a measure of consistency. In terms of accuracy and precision, reliability is precision, while validity is accuracy.
Bathroom scale analogy
An often-used example used to elucidate the difference between reliability and validity in the experimental sciences is a common bathroom scale. If someone that weighs 200 lbs. steps on the scale 10 times and it reads "200" each time, then the measurement is reliable and valid. If the scale consistently reads "150", then it is not valid, but it is still reliable because the measurement is very consistent. If the scale varied a lot around 200 (190, 205, 192, 209, etc.), then the scale could be considered valid but not reliable.




TYPES OF RELIABILITY

Reliability may be estimated through a variety of methods that fall into two types: Single-administration (split half and internal consistency method) and multiple-administration (test-retest and parallel form method). Multiple-administration methods require that two assessments are administered.
Each of these estimation methods is sensitive to different sources of error and so might not be expected to be equal. Reliability estimates from one sample might differ from those of a second sample if the second sample is drawn from a different population. (This is true of measures of all types--yardsticks might measure houses well yet have poor reliability when used to measure the lengths of insects.)


Inter-Rater or Inter-Observer Reliability
Inter-Rater or Inter-Observer Reliability measures homogeneity, is administering the same form to the same people by two or more raters/interviewers so as to establish the extent of consensus on use of the instrument by those who administer it. Used to assess the degree to which different raters/observers give consistent estimates of the same phenomenon.
There are two major ways to actually estimate inter-rater reliability. If your measurement consists of categories -- the raters are checking off which category each observation falls in -- you can calculate the percent of agreement between the raters. For instance, let's say you had 100 observations that were being rated by two raters. For each observation, the rater could check one of three categories. Imagine that on 86 of the 100 observations the raters checked the same category. In this case, the percent of agreement would be 86%. OK, it's a crude measure, but it does give an idea of how much agreement exists, and it works no matter how many categories are used for each observation.
The other major way to estimate inter-rater reliability is appropriate when the measure is a continuous one. There, all you need to do is calculate the correlation between the ratings of the two observers. For instance, they might be rating the overall level of activity in a classroom on a 1-to-7 scale. You could have them give their rating at regular time intervals (e.g., every 30 seconds). The correlation between these ratings would give you an estimate of the reliability or consistency between the raters.

Test-retest Reliability
The most commonly used method of determining reliability is through the test-retest method which measures stability over timeThe test-retest method of estimating a test's reliability involves administering the test to the same group of people at least twice.  Then the first set of scores is correlated with the second set of scores to determine if the scores on the first test are related to the scores on the second test.  Correlations range between 0 (low reliability) and 1 (high reliability), (highly unlikely they will be negative!).
This approach assumes that there is no substantial change in the construct being measured between the two occasions. The amount of time allowed between measures is critical. We know that if we measure the same thing twice that the correlation between the two observations will depend in part by how much time elapses between the two measurement occasions. The shorter the time gap, the higher the correlation; the longer the time gap, the lower the correlation.


Split-Half Reliability
In split-half reliability we randomly divide all items that purport to measure the same construct into two sets. We administer the entire instrument to a sample of people and calculate the total score for each randomly divided half. The split-half reliability estimate, as shown in the figure, is simply the correlation between these two total scores. In the example it is .87. It is the relationship between half the items and the other half. The random distribution for split half reliability can be done by software.

Parallel-Forms Reliability
In parallel forms reliability we first have to create two parallel forms. One way to accomplish this is to create a large set of questions that address the same construct and then randomly divide the questions into two sets. We administer both instruments to the same sample of people. The correlation between the two parallel forms is the estimate of reliability. One major problem with this approach is that we have to be able to generate lots of items that reflect the same construct.
Furthermore, this approach makes the assumption that the randomly divided halves are parallel or equivalent. Even by chance this will sometimes not be the case. The parallel forms approach is very similar to the split-half reliability. The major difference is that parallel forms are constructed so that the two forms can be used independent of each other and considered equivalent measures. For instance, we might be concerned about a testing threat to internal validity. If we use Form A for the pretest and Form B for the posttest, we minimize that problem. It would even be better if we randomly assign individuals to receive Form A or B on the pretest and then switch them on the posttest.

Internal Consistency Reliability
It is used to assess the consistency of results across items within a test. In internal consistency reliability estimation we use our single measurement instrument administered to a group of people on one occasion to estimate reliability. In effect we judge the reliability of the instrument by estimating how well the items that reflect the same construct yield similar results. We are looking at how consistent the results are for different items for the same construct within the measure. There are a wide variety of internal consistency measures that can be used and one of them is Average Inter-item Correlation.
Average Inter-item Correlation: The average inter-item correlation uses all of the items on our instrument that are designed to measure the same construct. We first compute the correlation between each pair of items, as illustrated in the figure. For example, if we have six items we will have 15 different item pairings (i.e., 15 correlations). The average inter item correlation is simply the average or mean of all these correlations. In the example, we find an average inter-item correlation of .90 with the individual correlations ranging from .84 to .95.

How reliable should tests be?  Some reliability guidelines
.90 = high reliability
.80 = moderate reliability
.70 = low reliability
  • High reliability is required when tests are used to make important decisions, individuals are sorted into many different categories based upon relatively small individual differences e.g. intelligence (Most standardized tests of intelligence report reliability estimates around .90).
  • Lower reliability is acceptable when tests are used for preliminary rather than final decisions, tests are used to sort people into a small number of groups based on gross individual differences e.g. height or sociability /extraversion.
  • Reliability estimates below .60 are usually regarded as unacceptably low.

Sources of Error or Sources of Unreliability

  • Respondent’s or subject’s mood, fatigue, or motivation which effect his or her responses
  • Observer’s measurements, which can be influenced by the same factor affecting the subject’s responses
  • The condition under which measurement is made, which may produce responses which do not reflect the true scores. Measurement errors are essentially random: a person’s test score might not reflect the true score because they were sick, anxious, in a noisy room, etc.
  • Problems with the measurement instrument, such as poorly worded questions in an interview
  • Processing problems such as simple coding or mechanical errors


Contemporary Textile Designs:



Contemporary Textile Designs:
Contemporary textile designs are modification of the existing design with the time, fashion, needs and technology development. Contemporary form of traditional designs is more stylized, and designers mix the various feathers of different motifs and create a new sense. In the primitive time designs were more symbolic in nature, but today people rarely thinks about the hidden massage behind a particular design. Now the main purpose of textile motif is to enrich the surface of fabric to create a decorative effect but in earlier time traditional motifs had a purpose behind them for example: some designs were made for royal family, some for ceremonial purpose, some for war and some for respect etc. in the contemporize form of textile design there is no rule, that a specific design is meant for a particular society, of purpose, or caste etc.
In the modern textile fusion of art is most famous technique to create new designs. Designers mix two art forms like Kantha and Block printing, Chikankari and Sequin work, Tie and dye and block printing etc, and gave a new look to traditional arts.

unconventional textile fibres



India has a rich heritage of natural plant material due to wide range in climatic conditions.
Common name of the plant
Botanical name
Part used
Various end uses
banana
Musa saplentum
leaf
Bags, sarees
rambans
Agav americana
leaf
Towel hangers, hand purse
murva
Sansevieria trifaciata
leaf
Mats, rugs
bhindi
Hibiscus ficulneus
Stalk
cords doormate, bags, rugs
hemp
Cannabis sativa
stalk
Cord, fabric, canvas
kenaf
Hibiscus anabinus
stalk
Twines, cords
bhang
Cannabis sativa
stalk
Basket coasters
jute
Corchorus capsularies
stalk
Bagging, wraping fabric, padding cloth, carpets and rugs
flax
Linum usitatissimum
stalk
Cambric, lace, bag
ramie
Boehmeria nivea
stalk
Canvas, swing threads, clothing, fine hose
sunn
Crotalaria juncea
stalk
Rugs, paper, fishnets
nettle
Vtrica dioica
stalk
Cordage, sailcloth
cotton
Gossipum hirsutum
seed
Clothing purpose