Dataset statistics
Number of variables | 23 |
---|---|
Number of observations | 1000000 |
Missing cells | 926672 |
Missing cells (%) | 4.0% |
Duplicate rows | 0 |
Duplicate rows (%) | 0.0% |
Total size in memory | 58.2 MiB |
Average record size in memory | 61.0 B |
Variable types
CAT | 15 |
---|---|
NUM | 8 |
Dataset
Description | Este reporte fue generado con solo un millón de observaciones (1.90% del total). |
---|---|
URL | http://international.ipums.org/ |
Copyright | (c) IPUMS International 2020 |
perwt is highly correlated with hhwt | High correlation |
hhwt is highly correlated with perwt | High correlation |
year is highly correlated with country and 1 other fields | High correlation |
country is highly correlated with year and 1 other fields | High correlation |
sample is highly correlated with country and 1 other fields | High correlation |
edattaind is highly correlated with edattain | High correlation |
edattain is highly correlated with edattaind | High correlation |
empstatd is highly correlated with empstat | High correlation |
empstat is highly correlated with empstatd | High correlation |
internet has 179331 (17.9%) missing values | Missing |
age has 52174 (5.2%) missing values | Missing |
race has 478178 (47.8%) missing values | Missing |
indig has 216989 (21.7%) missing values | Missing |
df_index has unique values | Unique |
Reproduction
Analysis started | 2020-11-17 18:40:48.631205 |
---|---|
Analysis finished | 2020-11-17 18:42:01.485309 |
Duration | 1 minute and 12.85 seconds |
Software version | pandas-profiling v2.9.0 |
Download configuration | config.yaml |
Distinct | 1000000 |
---|---|
Distinct (%) | 100.0% |
Missing | 0 |
Missing (%) | 0.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Mean | 26284932.2 |
---|---|
Minimum | 121 |
Maximum | 52546643 |
Zeros | 0 |
Zeros (%) | 0.0% |
Memory size | 7.6 MiB |
Quantile statistics
Minimum | 121 |
---|---|
5-th percentile | 2628396 |
Q1 | 13142317 |
median | 26304903.5 |
Q3 | 39417358.75 |
95-th percentile | 49911590.75 |
Maximum | 52546643 |
Range | 52546522 |
Interquartile range (IQR) | 26275041.75 |
Descriptive statistics
Standard deviation | 15167175.84 |
---|---|
Coefficient of variation (CV) | 0.5770292928 |
Kurtosis | -1.200299036 |
Mean | 26284932.2 |
Median Absolute Deviation (MAD) | 13138195 |
Skewness | -0.001385719184 |
Sum | 2.62849322e+13 |
Variance | 2.300432229e+14 |
Monotocity | Not monotonic |
Value | Count | Frequency (%) | |
33558526 | 1 | < 0.1% | |
11021623 | 1 | < 0.1% | |
5331826 | 1 | < 0.1% | |
22127469 | 1 | < 0.1% | |
22123371 | 1 | < 0.1% | |
24222570 | 1 | < 0.1% | |
4189598 | 1 | < 0.1% | |
24240997 | 1 | < 0.1% | |
13757284 | 1 | < 0.1% | |
17945443 | 1 | < 0.1% | |
Other values (999990) | 999990 | > 99.9% |
Value | Count | Frequency (%) | |
121 | 1 | < 0.1% | |
200 | 1 | < 0.1% | |
230 | 1 | < 0.1% | |
343 | 1 | < 0.1% | |
375 | 1 | < 0.1% |
Value | Count | Frequency (%) | |
52546643 | 1 | < 0.1% | |
52546439 | 1 | < 0.1% | |
52546429 | 1 | < 0.1% | |
52546387 | 1 | < 0.1% | |
52546384 | 1 | < 0.1% |
Distinct | 16 |
---|---|
Distinct (%) | < 0.1% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 977.3 KiB |
brazil | |
---|---|
mexico | |
colombia | |
argentina | |
peru | |
Other values (11) |
Value | Count | Frequency (%) | |
brazil | 392607 | 39.3% | |
mexico | 216252 | 21.6% | |
colombia | 76130 | 7.6% | |
argentina | 75268 | 7.5% | |
peru | 52196 | 5.2% | |
venezuela | 43727 | 4.4% | |
chile | 28496 | 2.8% | |
ecuador | 27605 | 2.8% | |
dominican republic | 17865 | 1.8% | |
haiti | 16281 | 1.6% | |
Other values (6) | 53573 | 5.4% |
Unique
Unique | 0 ? |
---|---|
Unique (%) | 0.0% |
Length
Max length | 18 |
---|---|
Median length | 6 |
Mean length | 6.749365 |
Min length | 4 |
Distinct | 8 |
---|---|
Distinct (%) | < 0.1% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 976.9 KiB |
2010 | |
---|---|
2015 | |
2005 | |
2007 | |
2001 | |
Other values (3) |
Value | Count | Frequency (%) | |
2010 | 519814 | 52.0% | |
2015 | 216252 | 21.6% | |
2005 | 86102 | 8.6% | |
2007 | 63170 | 6.3% | |
2001 | 55379 | 5.5% | |
2002 | 28496 | 2.8% | |
2003 | 16281 | 1.6% | |
2011 | 14506 | 1.5% |
Unique
Unique | 0 ? |
---|---|
Unique (%) | 0.0% |
Length
Max length | 4 |
---|---|
Median length | 4 |
Mean length | 4 |
Min length | 4 |
Distinct | 16 |
---|---|
Distinct (%) | < 0.1% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 977.3 KiB |
brazil 2010 | |
---|---|
mexico 2015 | |
colombia 2005 | |
argentina 2010 | |
peru 2007 | |
Other values (11) |
Value | Count | Frequency (%) | |
brazil 2010 | 392607 | 39.3% | |
mexico 2015 | 216252 | 21.6% | |
colombia 2005 | 76130 | 7.6% | |
argentina 2010 | 75268 | 7.5% | |
peru 2007 | 52196 | 5.2% | |
venezuela 2001 | 43727 | 4.4% | |
chile 2002 | 28496 | 2.8% | |
ecuador 2010 | 27605 | 2.8% | |
dominican republic 2010 | 17865 | 1.8% | |
haiti 2003 | 16281 | 1.6% | |
Other values (6) | 53573 | 5.4% |
Unique
Unique | 0 ? |
---|---|
Unique (%) | 0.0% |
Length
Max length | 23 |
---|---|
Median length | 11 |
Mean length | 11.749365 |
Min length | 9 |
serial
Real number (ℝ≥0)
Distinct | 899049 |
---|---|
Distinct (%) | 89.9% |
Missing | 0 |
Missing (%) | 0.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Mean | 1618102350 |
---|---|
Minimum | 1000 |
Maximum | 6192502000 |
Zeros | 0 |
Zeros (%) | 0.0% |
Memory size | 7.6 MiB |
Quantile statistics
Minimum | 1000 |
---|---|
5-th percentile | 46236900 |
Q1 | 300343000.8 |
median | 915235001 |
Q3 | 2507746500 |
95-th percentile | 5334000650 |
Maximum | 6192502000 |
Range | 6192501000 |
Interquartile range (IQR) | 2207403499 |
Descriptive statistics
Standard deviation | 1676355600 |
---|---|
Coefficient of variation (CV) | 1.036000968 |
Kurtosis | 0.2089770035 |
Mean | 1618102350 |
Median Absolute Deviation (MAD) | 782556500.5 |
Skewness | 1.140027561 |
Sum | 1.61810235e+15 |
Variance | 2.810168099e+18 |
Monotocity | Not monotonic |
Value | Count | Frequency (%) | |
13350000 | 6 | < 0.1% | |
45449000 | 6 | < 0.1% | |
425223001 | 6 | < 0.1% | |
54600000 | 6 | < 0.1% | |
74039000 | 6 | < 0.1% | |
6876000 | 6 | < 0.1% | |
2435000 | 6 | < 0.1% | |
42452000 | 6 | < 0.1% | |
46315000 | 6 | < 0.1% | |
22929001 | 5 | < 0.1% | |
Other values (899039) | 999941 | > 99.9% |
Value | Count | Frequency (%) | |
1000 | 1 | < 0.1% | |
3000 | 1 | < 0.1% | |
4000 | 1 | < 0.1% | |
5001 | 1 | < 0.1% | |
8000 | 1 | < 0.1% |
Value | Count | Frequency (%) | |
6192502000 | 1 | < 0.1% | |
6192496000 | 1 | < 0.1% | |
6192463000 | 1 | < 0.1% | |
6192459000 | 1 | < 0.1% | |
6192449000 | 1 | < 0.1% |
persons
Real number (ℝ≥0)
Distinct | 40 |
---|---|
Distinct (%) | < 0.1% |
Missing | 0 |
Missing (%) | 0.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Mean | 4.662609 |
---|---|
Minimum | 1 |
Maximum | 50 |
Zeros | 0 |
Zeros (%) | 0.0% |
Memory size | 976.6 KiB |
Quantile statistics
Minimum | 1 |
---|---|
5-th percentile | 2 |
Q1 | 3 |
median | 4 |
Q3 | 6 |
95-th percentile | 9 |
Maximum | 50 |
Range | 49 |
Interquartile range (IQR) | 3 |
Descriptive statistics
Standard deviation | 2.329983636 |
---|---|
Coefficient of variation (CV) | 0.4997167113 |
Kurtosis | 7.948146601 |
Mean | 4.662609 |
Median Absolute Deviation (MAD) | 1 |
Skewness | 1.737270608 |
Sum | 4662609 |
Variance | 5.428823742 |
Monotocity | Not monotonic |
Value | Count | Frequency (%) | |
4 | 227561 | 22.8% | |
3 | 178336 | 17.8% | |
5 | 176208 | 17.6% | |
6 | 111278 | 11.1% | |
2 | 105235 | 10.5% | |
7 | 69180 | 6.9% | |
8 | 37809 | 3.8% | |
1 | 35276 | 3.5% | |
9 | 22469 | 2.2% | |
10 | 14972 | 1.5% | |
Other values (30) | 21676 | 2.2% |
Value | Count | Frequency (%) | |
1 | 35276 | 3.5% | |
2 | 105235 | 10.5% | |
3 | 178336 | 17.8% | |
4 | 227561 | 22.8% | |
5 | 176208 | 17.6% |
Value | Count | Frequency (%) | |
50 | 1 | < 0.1% | |
44 | 1 | < 0.1% | |
43 | 1 | < 0.1% | |
42 | 4 | < 0.1% | |
40 | 3 | < 0.1% |
Distinct | 6175 |
---|---|
Distinct (%) | 0.6% |
Missing | 0 |
Missing (%) | 0.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Mean | 9.8261696 |
---|---|
Minimum | 0 |
Maximum | 490 |
Zeros | 441 |
Zeros (%) | < 0.1% |
Memory size | 7.6 MiB |
Quantile statistics
Minimum | 0 |
---|---|
5-th percentile | 2 |
Q1 | 4.64 |
median | 10 |
Q3 | 10 |
95-th percentile | 22.39 |
Maximum | 490 |
Range | 490 |
Interquartile range (IQR) | 5.36 |
Descriptive statistics
Standard deviation | 9.38108173 |
---|---|
Coefficient of variation (CV) | 0.9547038279 |
Kurtosis | 123.6488759 |
Mean | 9.8261696 |
Median Absolute Deviation (MAD) | 2.55 |
Skewness | 7.28882983 |
Sum | 9826169.6 |
Variance | 88.00469443 |
Monotocity | Not monotonic |
Value | Count | Frequency (%) | |
10 | 327279 | 32.7% | |
2 | 53766 | 5.4% | |
4 | 51667 | 5.2% | |
6 | 31752 | 3.2% | |
4.64 | 28724 | 2.9% | |
8 | 18909 | 1.9% | |
12 | 7750 | 0.8% | |
14 | 5681 | 0.6% | |
16 | 4311 | 0.4% | |
18 | 3415 | 0.3% | |
Other values (6165) | 466746 | 46.7% |
Value | Count | Frequency (%) | |
0 | 441 | < 0.1% | |
0.77 | 6 | < 0.1% | |
0.83 | 1 | < 0.1% | |
0.84 | 11 | < 0.1% | |
0.85 | 7 | < 0.1% |
Value | Count | Frequency (%) | |
490 | 3 | < 0.1% | |
478 | 1 | < 0.1% | |
410 | 3 | < 0.1% | |
394 | 1 | < 0.1% | |
376 | 2 | < 0.1% |
gq
Categorical
Distinct | 6 |
---|---|
Distinct (%) | < 0.1% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 976.8 KiB |
households | |
---|---|
other group quarters | 2362 |
institutions | 1420 |
group quarters (collective), n.s | 666 |
1-person unit created by splitting large household | 441 |
Value | Count | Frequency (%) | |
households | 995001 | 99.5% | |
other group quarters | 2362 | 0.2% | |
institutions | 1420 | 0.1% | |
group quarters (collective), n.s | 666 | 0.1% | |
1-person unit created by splitting large household | 441 | < 0.1% | |
unknown/group quarters not identified | 110 | < 0.1% |
Unique
Unique | 0 ? |
---|---|
Unique (%) | 0.0% |
Length
Max length | 50 |
---|---|
Median length | 10 |
Mean length | 10.061722 |
Min length | 10 |
geolev1
Real number (ℝ≥0)
Distinct | 312 |
---|---|
Distinct (%) | < 0.1% |
Missing | 0 |
Missing (%) | 0.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Mean | 261391.6515 |
---|---|
Minimum | 32002 |
Maximum | 862023 |
Zeros | 0 |
Zeros (%) | 0.0% |
Memory size | 3.8 MiB |
Quantile statistics
Minimum | 32002 |
---|---|
5-th percentile | 32038 |
Q1 | 76031 |
median | 170005 |
Q3 | 484015 |
95-th percentile | 604025 |
Maximum | 862023 |
Range | 830021 |
Interquartile range (IQR) | 407984 |
Descriptive statistics
Standard deviation | 234868.0863 |
---|---|
Coefficient of variation (CV) | 0.8985294095 |
Kurtosis | -0.1831824525 |
Mean | 261391.6515 |
Median Absolute Deviation (MAD) | 93982 |
Skewness | 0.9480444751 |
Sum | 2.613916515e+11 |
Variance | 5.516301796e+10 |
Monotocity | Not monotonic |
Value | Count | Frequency (%) | |
76035 | 69555 | 7.0% | |
76031 | 47966 | 4.8% | |
76029 | 29665 | 3.0% | |
32006 | 29422 | 2.9% | |
76043 | 26089 | 2.6% | |
76041 | 24756 | 2.5% | |
76033 | 21626 | 2.2% | |
484020 | 21178 | 2.1% | |
484030 | 19518 | 2.0% | |
218009 | 18922 | 1.9% | |
Other values (302) | 691303 | 69.1% |
Value | Count | Frequency (%) | |
32002 | 5516 | 0.6% | |
32006 | 29422 | 2.9% | |
32010 | 681 | 0.1% | |
32014 | 6230 | 0.6% | |
32018 | 1777 | 0.2% |
Value | Count | Frequency (%) | |
862023 | 5762 | 0.6% | |
862022 | 1000 | 0.1% | |
862021 | 1164 | 0.1% | |
862020 | 1877 | 0.2% | |
862019 | 1469 | 0.1% |
Distinct | 4 |
---|---|
Distinct (%) | < 0.1% |
Missing | 179331 |
Missing (%) | 17.9% |
Memory size | 976.8 KiB |
no | |
---|---|
niu (not in universe) | |
yes | |
unknown | 4053 |
Value | Count | Frequency (%) | |
no | 384908 | 38.5% | |
niu (not in universe) | 271723 | 27.2% | |
yes | 159985 | 16.0% | |
unknown | 4053 | 0.4% | |
(Missing) | 179331 | 17.9% |
Unique
Unique | 0 ? |
---|---|
Unique (%) | 0.0% |
Length
Max length | 21 |
---|---|
Median length | 3 |
Mean length | 7.522318 |
Min length | 2 |
computer
Categorical
Distinct | 4 |
---|---|
Distinct (%) | < 0.1% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 976.8 KiB |
no | |
---|---|
yes | |
niu (not in universe) | 4519 |
unknown/missing | 3486 |
Value | Count | Frequency (%) | |
no | 735566 | 73.6% | |
yes | 256429 | 25.6% | |
niu (not in universe) | 4519 | 0.5% | |
unknown/missing | 3486 | 0.3% |
Unique
Unique | 0 ? |
---|---|
Unique (%) | 0.0% |
Length
Max length | 21 |
---|---|
Median length | 2 |
Mean length | 2.387608 |
Min length | 2 |
pernum
Real number (ℝ≥0)
Distinct | 36 |
---|---|
Distinct (%) | < 0.1% |
Missing | 0 |
Missing (%) | 0.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Mean | 2.829162 |
---|---|
Minimum | 1 |
Maximum | 46 |
Zeros | 0 |
Zeros (%) | 0.0% |
Memory size | 976.6 KiB |
Quantile statistics
Minimum | 1 |
---|---|
5-th percentile | 1 |
Q1 | 1 |
median | 2 |
Q3 | 4 |
95-th percentile | 6 |
Maximum | 46 |
Range | 45 |
Interquartile range (IQR) | 3 |
Descriptive statistics
Standard deviation | 1.879112532 |
---|---|
Coefficient of variation (CV) | 0.6641940378 |
Kurtosis | 7.18633033 |
Mean | 2.829162 |
Median Absolute Deviation (MAD) | 1 |
Skewness | 1.795066969 |
Sum | 2829162 |
Variance | 3.531063909 |
Monotocity | Not monotonic |
Value | Count | Frequency (%) | |
1 | 278237 | 27.8% | |
2 | 243297 | 24.3% | |
3 | 190857 | 19.1% | |
4 | 130982 | 13.1% | |
5 | 74145 | 7.4% | |
6 | 38625 | 3.9% | |
7 | 20123 | 2.0% | |
8 | 10316 | 1.0% | |
9 | 5680 | 0.6% | |
10 | 3198 | 0.3% | |
Other values (26) | 4540 | 0.5% |
Value | Count | Frequency (%) | |
1 | 278237 | 27.8% | |
2 | 243297 | 24.3% | |
3 | 190857 | 19.1% | |
4 | 130982 | 13.1% | |
5 | 74145 | 7.4% |
Value | Count | Frequency (%) | |
46 | 1 | < 0.1% | |
39 | 1 | < 0.1% | |
37 | 1 | < 0.1% | |
35 | 1 | < 0.1% | |
32 | 2 | < 0.1% |
Distinct | 6174 |
---|---|
Distinct (%) | 0.6% |
Missing | 0 |
Missing (%) | 0.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Mean | 9.83114623 |
---|---|
Minimum | 0.77 |
Maximum | 490 |
Zeros | 0 |
Zeros (%) | 0.0% |
Memory size | 7.6 MiB |
Quantile statistics
Minimum | 0.77 |
---|---|
5-th percentile | 2 |
Q1 | 4.65 |
median | 10 |
Q3 | 10 |
95-th percentile | 22.39 |
Maximum | 490 |
Range | 489.23 |
Interquartile range (IQR) | 5.35 |
Descriptive statistics
Standard deviation | 9.379103761 |
---|---|
Coefficient of variation (CV) | 0.9540193525 |
Kurtosis | 123.7429894 |
Mean | 9.83114623 |
Median Absolute Deviation (MAD) | 2.54 |
Skewness | 7.292926847 |
Sum | 9831146.23 |
Variance | 87.96758735 |
Monotocity | Not monotonic |
Value | Count | Frequency (%) | |
10 | 327715 | 32.8% | |
2 | 53766 | 5.4% | |
4 | 51667 | 5.2% | |
6 | 31752 | 3.2% | |
4.64 | 28724 | 2.9% | |
8 | 18909 | 1.9% | |
12 | 7750 | 0.8% | |
14 | 5681 | 0.6% | |
16 | 4311 | 0.4% | |
18 | 3415 | 0.3% | |
Other values (6164) | 466310 | 46.6% |
Value | Count | Frequency (%) | |
0.77 | 6 | < 0.1% | |
0.83 | 1 | < 0.1% | |
0.84 | 11 | < 0.1% | |
0.85 | 7 | < 0.1% | |
0.86 | 1 | < 0.1% |
Value | Count | Frequency (%) | |
490 | 3 | < 0.1% | |
478 | 1 | < 0.1% | |
410 | 3 | < 0.1% | |
394 | 1 | < 0.1% | |
376 | 2 | < 0.1% |
Distinct | 97 |
---|---|
Distinct (%) | < 0.1% |
Missing | 52174 |
Missing (%) | 5.2% |
Infinite | 0 |
Infinite (%) | 0.0% |
Mean | 31.67749249 |
---|---|
Minimum | 3 |
Maximum | 99 |
Zeros | 0 |
Zeros (%) | 0.0% |
Memory size | 7.6 MiB |
Quantile statistics
Minimum | 3 |
---|---|
5-th percentile | 5 |
Q1 | 15 |
median | 28 |
Q3 | 45 |
95-th percentile | 70 |
Maximum | 99 |
Range | 96 |
Interquartile range (IQR) | 30 |
Descriptive statistics
Standard deviation | 20.2215345 |
---|---|
Coefficient of variation (CV) | 0.63835654 |
Kurtosis | -0.3792163021 |
Mean | 31.67749249 |
Median Absolute Deviation (MAD) | 15 |
Skewness | 0.6225786121 |
Sum | 30024751 |
Variance | 408.9104577 |
Monotocity | Not monotonic |
Value | Count | Frequency (%) | |
10 | 20524 | 2.1% | |
14 | 20128 | 2.0% | |
12 | 20062 | 2.0% | |
15 | 20007 | 2.0% | |
13 | 19701 | 2.0% | |
11 | 19421 | 1.9% | |
9 | 19243 | 1.9% | |
16 | 18980 | 1.9% | |
18 | 18972 | 1.9% | |
8 | 18937 | 1.9% | |
Other values (87) | 751851 | 75.2% | |
(Missing) | 52174 | 5.2% |
Value | Count | Frequency (%) | |
3 | 18187 | 1.8% | |
4 | 18231 | 1.8% | |
5 | 18468 | 1.8% | |
6 | 18250 | 1.8% | |
7 | 18694 | 1.9% |
Value | Count | Frequency (%) | |
99 | 57 | < 0.1% | |
98 | 120 | < 0.1% | |
97 | 106 | < 0.1% | |
96 | 152 | < 0.1% | |
95 | 203 | < 0.1% |
sex
Categorical
Distinct | 2 |
---|---|
Distinct (%) | < 0.1% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 976.7 KiB |
female | |
---|---|
male |
Value | Count | Frequency (%) | |
female | 507829 | 50.8% | |
male | 492171 | 49.2% |
Unique
Unique | 0 ? |
---|---|
Unique (%) | 0.0% |
Length
Max length | 6 |
---|---|
Median length | 6 |
Mean length | 5.015658 |
Min length | 4 |
Distinct | 13 |
---|---|
Distinct (%) | < 0.1% |
Missing | 478178 |
Missing (%) | 47.8% |
Memory size | 977.3 KiB |
white | |
---|---|
brown (brazil) | |
black | |
mestizo (indigenous and white) | |
indigenous | 9068 |
Other values (8) | 11804 |
Value | Count | Frequency (%) | |
white | 261429 | 26.1% | |
brown (brazil) | 173620 | 17.4% | |
black | 36413 | 3.6% | |
mestizo (indigenous and white) | 29488 | 2.9% | |
indigenous | 9068 | 0.9% | |
asian | 4004 | 0.4% | |
unknown | 3078 | 0.3% | |
montubio (ecuador) | 2045 | 0.2% | |
afro-ecuadorian | 1146 | 0.1% | |
mulatto (black and white) | 1086 | 0.1% | |
Other values (3) | 445 | < 0.1% | |
(Missing) | 478178 | 47.8% |
Unique
Unique | 0 ? |
---|---|
Unique (%) | 0.0% |
Length
Max length | 30 |
---|---|
Median length | 5 |
Mean length | 6.455739 |
Min length | 3 |
Distinct | 3 |
---|---|
Distinct (%) | < 0.1% |
Missing | 216989 |
Missing (%) | 21.7% |
Memory size | 976.7 KiB |
no | |
---|---|
yes | |
unknown | 5220 |
Value | Count | Frequency (%) | |
no | 689348 | 68.9% | |
yes | 88443 | 8.8% | |
unknown | 5220 | 0.5% | |
(Missing) | 216989 | 21.7% |
Unique
Unique | 0 ? |
---|---|
Unique (%) | 0.0% |
Length
Max length | 7 |
---|---|
Median length | 2 |
Mean length | 2.331532 |
Min length | 2 |
lit
Categorical
Distinct | 4 |
---|---|
Distinct (%) | < 0.1% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 976.8 KiB |
yes, literate | |
---|---|
no, illiterate | |
niu (not in universe) | 78998 |
unknown/missing | 3724 |
Value | Count | Frequency (%) | |
yes, literate | 796402 | 79.6% | |
no, illiterate | 120876 | 12.1% | |
niu (not in universe) | 78998 | 7.9% | |
unknown/missing | 3724 | 0.4% |
Unique
Unique | 0 ? |
---|---|
Unique (%) | 0.0% |
Length
Max length | 21 |
---|---|
Median length | 13 |
Mean length | 13.760308 |
Min length | 13 |
Distinct | 6 |
---|---|
Distinct (%) | < 0.1% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 976.8 KiB |
less than primary completed | |
---|---|
primary completed | |
secondary completed | |
university completed | |
niu (not in universe) | 36541 |
Value | Count | Frequency (%) | |
less than primary completed | 429963 | 43.0% | |
primary completed | 313201 | 31.3% | |
secondary completed | 167529 | 16.8% | |
university completed | 47850 | 4.8% | |
niu (not in universe) | 36541 | 3.7% | |
unknown | 4916 | 0.5% |
Unique
Unique | 0 ? |
---|---|
Unique (%) | 0.0% |
Length
Max length | 27 |
---|---|
Median length | 20 |
Mean length | 21.875242 |
Min length | 7 |
Distinct | 14 |
---|---|
Distinct (%) | < 0.1% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 977.3 KiB |
some primary completed | |
---|---|
primary (6 yrs) completed | |
no schooling | |
lower secondary general completed | |
secondary, general track completed | |
Other values (9) |
Value | Count | Frequency (%) | |
some primary completed | 225020 | 22.5% | |
primary (6 yrs) completed | 182641 | 18.3% | |
no schooling | 159289 | 15.9% | |
lower secondary general completed | 111377 | 11.1% | |
secondary, general track completed | 108353 | 10.8% | |
university completed | 47850 | 4.8% | |
primary (4 yrs) completed | 45654 | 4.6% | |
niu (not in universe) | 36541 | 3.7% | |
some college completed | 34733 | 3.5% | |
post-secondary technical education | 20280 | 2.0% | |
Other values (4) | 28262 | 2.8% |
Unique
Unique | 0 ? |
---|---|
Unique (%) | 0.0% |
Length
Max length | 36 |
---|---|
Median length | 22 |
Mean length | 23.834486 |
Min length | 12 |
Distinct | 5 |
---|---|
Distinct (%) | < 0.1% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 976.8 KiB |
inactive | |
---|---|
employed | |
niu (not in universe) | |
unemployed | |
unknown/missing | 4536 |
Value | Count | Frequency (%) | |
inactive | 397926 | 39.8% | |
employed | 376517 | 37.7% | |
niu (not in universe) | 180669 | 18.1% | |
unemployed | 40352 | 4.0% | |
unknown/missing | 4536 | 0.5% |
Unique
Unique | 0 ? |
---|---|
Unique (%) | 0.0% |
Length
Max length | 21 |
---|---|
Median length | 8 |
Mean length | 10.461153 |
Min length | 8 |
Distinct | 26 |
---|---|
Distinct (%) | < 0.1% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 978.0 KiB |
at work | |
---|---|
niu (not in universe) | |
inactive (not in labor force) | |
housework | |
in school | |
Other values (21) |
Value | Count | Frequency (%) | |
at work | 349999 | 35.0% | |
niu (not in universe) | 180669 | 18.1% | |
inactive (not in labor force) | 167415 | 16.7% | |
housework | 97386 | 9.7% | |
in school | 83505 | 8.4% | |
inactive, other reasons | 32795 | 3.3% | |
unemployed, not specified | 31308 | 3.1% | |
have job, not at work in reference period | 10144 | 1.0% | |
employed, not specified | 8645 | 0.9% | |
retirees and living on rent | 5630 | 0.6% | |
Other values (16) | 32504 | 3.3% |
Unique
Unique | 0 ? |
---|---|
Unique (%) | 0.0% |
Length
Max length | 42 |
---|---|
Median length | 9 |
Mean length | 15.759659 |
Min length | 7 |
labforce
Categorical
Distinct | 4 |
---|---|
Distinct (%) | < 0.1% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 976.8 KiB |
yes, in the labor force | |
---|---|
no, not in the labor force | |
niu (not in universe) | |
unknown | 3621 |
Value | Count | Frequency (%) | |
yes, in the labor force | 408514 | 40.9% | |
no, not in the labor force | 306144 | 30.6% | |
niu (not in universe) | 281721 | 28.2% | |
unknown | 3621 | 0.4% |
Unique
Unique | 0 ? |
---|---|
Unique (%) | 0.0% |
Length
Max length | 26 |
---|---|
Median length | 23 |
Mean length | 23.297054 |
Min length | 7 |
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.Cramér's V (φc)
Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.First rows
df_index | country | year | sample | serial | persons | hhwt | gq | geolev1 | internet | computer | pernum | perwt | age | sex | race | indig | lit | edattain | edattaind | empstat | empstatd | labforce | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 26329292 | colombia | 2005 | colombia 2005 | 5.881000e+07 | 5 | 6.16 | households | 170005 | NaN | no | 5 | 6.16 | 11.0 | male | white | no | yes, literate | primary completed | primary (5 yrs) completed | inactive | in school | niu (not in universe) |
1 | 45863653 | nicaragua | 2005 | nicaragua 2005 | 2.634000e+07 | 6 | 10.00 | households | 558025 | no | no | 1 | 10.00 | 53.0 | female | NaN | no | yes, literate | less than primary completed | some primary completed | employed | at work | yes, in the labor force |
2 | 51297159 | venezuela | 2001 | venezuela 2001 | 2.886910e+08 | 7 | 10.00 | households | 862013 | no | no | 1 | 10.00 | 52.0 | female | NaN | NaN | no, illiterate | less than primary completed | no schooling | unknown/missing | unknown/missing | unknown |
3 | 2009550 | argentina | 2010 | argentina 2010 | 6.480550e+08 | 5 | 10.00 | households | 32014 | NaN | no | 3 | 10.00 | 19.0 | male | NaN | NaN | yes, literate | primary completed | primary (6 yrs) completed | employed | at work | yes, in the labor force |
4 | 17220209 | brazil | 2010 | brazil 2010 | 3.834011e+09 | 5 | 3.26 | households | 76035 | yes | yes | 4 | 3.26 | 17.0 | female | white | no | yes, literate | secondary completed | secondary, general track completed | inactive | inactive (not in labor force) | no, not in the labor force |
5 | 45452679 | mexico | 2015 | mexico 2015 | 2.853132e+09 | 5 | 4.00 | households | 484031 | no | no | 4 | 4.00 | 26.0 | female | NaN | yes | yes, literate | university completed | university completed | employed | at work | yes, in the labor force |
6 | 4182065 | brazil | 2010 | brazil 2010 | 6.362100e+07 | 7 | 5.63 | households | 76012 | niu (not in universe) | no | 5 | 5.63 | 22.0 | male | white | no | yes, literate | primary completed | primary (6 yrs) completed | inactive | inactive (not in labor force) | no, not in the labor force |
7 | 761071 | argentina | 2010 | argentina 2010 | 2.650540e+08 | 2 | 10.00 | households | 32006 | NaN | no | 2 | 10.00 | 50.0 | male | NaN | NaN | yes, literate | primary completed | primary (6 yrs) completed | employed | at work | yes, in the labor force |
8 | 31054217 | dominican republic | 2010 | dominican republic 2010 | 1.624880e+08 | 6 | 10.00 | households | 214008 | no | no | 6 | 10.00 | 10.0 | male | NaN | NaN | yes, literate | less than primary completed | some primary completed | inactive | in school | niu (not in universe) |
9 | 36215848 | mexico | 2015 | mexico 2015 | 4.576560e+08 | 8 | 28.00 | households | 484009 | yes | yes | 8 | 28.00 | 50.0 | female | NaN | no | yes, literate | secondary completed | secondary, general track completed | employed | at work | yes, in the labor force |
Last rows
df_index | country | year | sample | serial | persons | hhwt | gq | geolev1 | internet | computer | pernum | perwt | age | sex | race | indig | lit | edattain | edattaind | empstat | empstatd | labforce | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
999990 | 34166301 | haiti | 2003 | haiti 2003 | 1.636770e+08 | 8 | 10.00 | households | 332006 | NaN | no | 4 | 10.00 | 12.0 | female | NaN | NaN | no, illiterate | less than primary completed | no schooling | inactive | housework | niu (not in universe) |
999991 | 6186480 | brazil | 2010 | brazil 2010 | 5.580270e+08 | 6 | 10.39 | households | 76021 | niu (not in universe) | no | 2 | 10.39 | 7.0 | male | white | no | yes, literate | less than primary completed | some primary completed | niu (not in universe) | niu (not in universe) | niu (not in universe) |
999992 | 30320964 | costa rica | 2011 | costa rica 2011 | 5.718100e+07 | 5 | 10.00 | households | 188002 | no | no | 1 | 10.00 | 38.0 | male | white | no | yes, literate | less than primary completed | some primary completed | employed | at work | yes, in the labor force |
999993 | 41488326 | mexico | 2015 | mexico 2015 | 1.817698e+09 | 1 | 2.00 | households | 484020 | no | no | 1 | 2.00 | 57.0 | female | NaN | yes | yes, literate | less than primary completed | some primary completed | inactive | housework | no, not in the labor force |
999994 | 16790221 | brazil | 2010 | brazil 2010 | 3.696445e+09 | 10 | 2.48 | households | 76035 | niu (not in universe) | no | 1 | 2.48 | 39.0 | male | white | no | no, illiterate | less than primary completed | some primary completed | employed | at work | yes, in the labor force |
999995 | 15875195 | brazil | 2010 | brazil 2010 | 3.403780e+09 | 4 | 15.96 | households | 76033 | niu (not in universe) | no | 3 | 15.96 | 26.0 | male | brown (brazil) | no | yes, literate | primary completed | lower secondary general completed | employed | at work | yes, in the labor force |
999996 | 22126704 | brazil | 2010 | brazil 2010 | 5.387448e+09 | 5 | 11.96 | households | 76043 | no | yes | 3 | 11.96 | NaN | female | black | no | niu (not in universe) | less than primary completed | no schooling | niu (not in universe) | niu (not in universe) | niu (not in universe) |
999997 | 25764931 | chile | 2002 | chile 2002 | 3.630960e+08 | 4 | 10.00 | households | 152131 | no | no | 3 | 10.00 | 9.0 | female | NaN | no | no, illiterate | less than primary completed | some primary completed | niu (not in universe) | niu (not in universe) | niu (not in universe) |
999998 | 49665403 | el salvador | 2007 | el salvador 2007 | 9.858900e+07 | 5 | 10.00 | households | 222006 | no | no | 5 | 10.00 | 4.0 | male | mestizo (indigenous and white) | no | niu (not in universe) | niu (not in universe) | niu (not in universe) | employed | marginally employed | niu (not in universe) |
999999 | 6642712 | brazil | 2010 | brazil 2010 | 6.748320e+08 | 5 | 11.97 | households | 76022 | niu (not in universe) | no | 2 | 11.97 | 30.0 | female | brown (brazil) | no | yes, literate | secondary completed | secondary, general track completed | employed | at work | yes, in the labor force |