Overview

Dataset statistics

Number of variables23
Number of observations1000000
Missing cells926672
Missing cells (%)4.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory58.2 MiB
Average record size in memory61.0 B

Variable types

CAT15
NUM8

Dataset

DescriptionEste reporte fue generado con solo un millón de observaciones (1.90% del total).
URLhttp://international.ipums.org/
Copyright(c) IPUMS International 2020

Warnings

perwt is highly correlated with hhwtHigh correlation
hhwt is highly correlated with perwtHigh correlation
year is highly correlated with country and 1 other fieldsHigh correlation
country is highly correlated with year and 1 other fieldsHigh correlation
sample is highly correlated with country and 1 other fieldsHigh correlation
edattaind is highly correlated with edattainHigh correlation
edattain is highly correlated with edattaindHigh correlation
empstatd is highly correlated with empstatHigh correlation
empstat is highly correlated with empstatdHigh correlation
internet has 179331 (17.9%) missing values Missing
age has 52174 (5.2%) missing values Missing
race has 478178 (47.8%) missing values Missing
indig has 216989 (21.7%) missing values Missing
df_index has unique values Unique

Reproduction

Analysis started2020-11-17 18:40:48.631205
Analysis finished2020-11-17 18:42:01.485309
Duration1 minute and 12.85 seconds
Software versionpandas-profiling v2.9.0
Download configurationconfig.yaml

Variables

df_index
Real number (ℝ≥0)

UNIQUE

Distinct1000000
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean26284932.2
Minimum121
Maximum52546643
Zeros0
Zeros (%)0.0%
Memory size7.6 MiB
2020-11-17T10:42:01.932952image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum121
5-th percentile2628396
Q113142317
median26304903.5
Q339417358.75
95-th percentile49911590.75
Maximum52546643
Range52546522
Interquartile range (IQR)26275041.75

Descriptive statistics

Standard deviation15167175.84
Coefficient of variation (CV)0.5770292928
Kurtosis-1.200299036
Mean26284932.2
Median Absolute Deviation (MAD)13138195
Skewness-0.001385719184
Sum2.62849322e+13
Variance2.300432229e+14
MonotocityNot monotonic
2020-11-17T10:42:02.065759image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
335585261< 0.1%
 
110216231< 0.1%
 
53318261< 0.1%
 
221274691< 0.1%
 
221233711< 0.1%
 
242225701< 0.1%
 
41895981< 0.1%
 
242409971< 0.1%
 
137572841< 0.1%
 
179454431< 0.1%
 
Other values (999990)999990> 99.9%
 
ValueCountFrequency (%) 
1211< 0.1%
 
2001< 0.1%
 
2301< 0.1%
 
3431< 0.1%
 
3751< 0.1%
 
ValueCountFrequency (%) 
525466431< 0.1%
 
525464391< 0.1%
 
525464291< 0.1%
 
525463871< 0.1%
 
525463841< 0.1%
 

country
Categorical

HIGH CORRELATION

Distinct16
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size977.3 KiB
brazil
392607 
mexico
216252 
colombia
76130 
argentina
75268 
peru
52196 
Other values (11)
187547 
ValueCountFrequency (%) 
brazil39260739.3%
 
mexico21625221.6%
 
colombia761307.6%
 
argentina752687.5%
 
peru521965.2%
 
venezuela437274.4%
 
chile284962.8%
 
ecuador276052.8%
 
dominican republic178651.8%
 
haiti162811.6%
 
Other values (6)535735.4%
 
2020-11-17T10:42:02.190428image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
2020-11-17T10:42:02.300457image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length18
Median length6
Mean length6.749365
Min length4

year
Categorical

HIGH CORRELATION

Distinct8
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size976.9 KiB
2010
519814 
2015
216252 
2005
86102 
2007
63170 
2001
55379 
Other values (3)
59283 
ValueCountFrequency (%) 
201051981452.0%
 
201521625221.6%
 
2005861028.6%
 
2007631706.3%
 
2001553795.5%
 
2002284962.8%
 
2003162811.6%
 
2011145061.5%
 
2020-11-17T10:42:02.401940image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
2020-11-17T10:42:02.466564image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-17T10:42:02.570071image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length4
Median length4
Mean length4
Min length4

sample
Categorical

HIGH CORRELATION

Distinct16
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size977.3 KiB
brazil 2010
392607 
mexico 2015
216252 
colombia 2005
76130 
argentina 2010
75268 
peru 2007
52196 
Other values (11)
187547 
ValueCountFrequency (%) 
brazil 201039260739.3%
 
mexico 201521625221.6%
 
colombia 2005761307.6%
 
argentina 2010752687.5%
 
peru 2007521965.2%
 
venezuela 2001437274.4%
 
chile 2002284962.8%
 
ecuador 2010276052.8%
 
dominican republic 2010178651.8%
 
haiti 2003162811.6%
 
Other values (6)535735.4%
 
2020-11-17T10:42:02.676496image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
2020-11-17T10:42:02.790553image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length23
Median length11
Mean length11.749365
Min length9

serial
Real number (ℝ≥0)

Distinct899049
Distinct (%)89.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1618102350
Minimum1000
Maximum6192502000
Zeros0
Zeros (%)0.0%
Memory size7.6 MiB
2020-11-17T10:42:03.184551image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum1000
5-th percentile46236900
Q1300343000.8
median915235001
Q32507746500
95-th percentile5334000650
Maximum6192502000
Range6192501000
Interquartile range (IQR)2207403499

Descriptive statistics

Standard deviation1676355600
Coefficient of variation (CV)1.036000968
Kurtosis0.2089770035
Mean1618102350
Median Absolute Deviation (MAD)782556500.5
Skewness1.140027561
Sum1.61810235e+15
Variance2.810168099e+18
MonotocityNot monotonic
2020-11-17T10:42:03.307090image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
133500006< 0.1%
 
454490006< 0.1%
 
4252230016< 0.1%
 
546000006< 0.1%
 
740390006< 0.1%
 
68760006< 0.1%
 
24350006< 0.1%
 
424520006< 0.1%
 
463150006< 0.1%
 
229290015< 0.1%
 
Other values (899039)999941> 99.9%
 
ValueCountFrequency (%) 
10001< 0.1%
 
30001< 0.1%
 
40001< 0.1%
 
50011< 0.1%
 
80001< 0.1%
 
ValueCountFrequency (%) 
61925020001< 0.1%
 
61924960001< 0.1%
 
61924630001< 0.1%
 
61924590001< 0.1%
 
61924490001< 0.1%
 

persons
Real number (ℝ≥0)

Distinct40
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4.662609
Minimum1
Maximum50
Zeros0
Zeros (%)0.0%
Memory size976.6 KiB
2020-11-17T10:42:03.424193image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile2
Q13
median4
Q36
95-th percentile9
Maximum50
Range49
Interquartile range (IQR)3

Descriptive statistics

Standard deviation2.329983636
Coefficient of variation (CV)0.4997167113
Kurtosis7.948146601
Mean4.662609
Median Absolute Deviation (MAD)1
Skewness1.737270608
Sum4662609
Variance5.428823742
MonotocityNot monotonic
2020-11-17T10:42:03.530857image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=40)
ValueCountFrequency (%) 
422756122.8%
 
317833617.8%
 
517620817.6%
 
611127811.1%
 
210523510.5%
 
7691806.9%
 
8378093.8%
 
1352763.5%
 
9224692.2%
 
10149721.5%
 
Other values (30)216762.2%
 
ValueCountFrequency (%) 
1352763.5%
 
210523510.5%
 
317833617.8%
 
422756122.8%
 
517620817.6%
 
ValueCountFrequency (%) 
501< 0.1%
 
441< 0.1%
 
431< 0.1%
 
424< 0.1%
 
403< 0.1%
 

hhwt
Real number (ℝ≥0)

HIGH CORRELATION

Distinct6175
Distinct (%)0.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean9.8261696
Minimum0
Maximum490
Zeros441
Zeros (%)< 0.1%
Memory size7.6 MiB
2020-11-17T10:42:03.651668image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile2
Q14.64
median10
Q310
95-th percentile22.39
Maximum490
Range490
Interquartile range (IQR)5.36

Descriptive statistics

Standard deviation9.38108173
Coefficient of variation (CV)0.9547038279
Kurtosis123.6488759
Mean9.8261696
Median Absolute Deviation (MAD)2.55
Skewness7.28882983
Sum9826169.6
Variance88.00469443
MonotocityNot monotonic
2020-11-17T10:42:03.789103image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
1032727932.7%
 
2537665.4%
 
4516675.2%
 
6317523.2%
 
4.64287242.9%
 
8189091.9%
 
1277500.8%
 
1456810.6%
 
1643110.4%
 
1834150.3%
 
Other values (6165)46674646.7%
 
ValueCountFrequency (%) 
0441< 0.1%
 
0.776< 0.1%
 
0.831< 0.1%
 
0.8411< 0.1%
 
0.857< 0.1%
 
ValueCountFrequency (%) 
4903< 0.1%
 
4781< 0.1%
 
4103< 0.1%
 
3941< 0.1%
 
3762< 0.1%
 

gq
Categorical

Distinct6
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size976.8 KiB
households
995001 
other group quarters
 
2362
institutions
 
1420
group quarters (collective), n.s
 
666
1-person unit created by splitting large household
 
441
ValueCountFrequency (%) 
households99500199.5%
 
other group quarters23620.2%
 
institutions14200.1%
 
group quarters (collective), n.s6660.1%
 
1-person unit created by splitting large household441< 0.1%
 
unknown/group quarters not identified110< 0.1%
 
2020-11-17T10:42:04.086320image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
2020-11-17T10:42:04.153275image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-17T10:42:04.268690image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length50
Median length10
Mean length10.061722
Min length10

geolev1
Real number (ℝ≥0)

Distinct312
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean261391.6515
Minimum32002
Maximum862023
Zeros0
Zeros (%)0.0%
Memory size3.8 MiB
2020-11-17T10:42:04.379475image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum32002
5-th percentile32038
Q176031
median170005
Q3484015
95-th percentile604025
Maximum862023
Range830021
Interquartile range (IQR)407984

Descriptive statistics

Standard deviation234868.0863
Coefficient of variation (CV)0.8985294095
Kurtosis-0.1831824525
Mean261391.6515
Median Absolute Deviation (MAD)93982
Skewness0.9480444751
Sum2.613916515e+11
Variance5.516301796e+10
MonotocityNot monotonic
2020-11-17T10:42:04.493910image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
76035695557.0%
 
76031479664.8%
 
76029296653.0%
 
32006294222.9%
 
76043260892.6%
 
76041247562.5%
 
76033216262.2%
 
484020211782.1%
 
484030195182.0%
 
218009189221.9%
 
Other values (302)69130369.1%
 
ValueCountFrequency (%) 
3200255160.6%
 
32006294222.9%
 
320106810.1%
 
3201462300.6%
 
3201817770.2%
 
ValueCountFrequency (%) 
86202357620.6%
 
86202210000.1%
 
86202111640.1%
 
86202018770.2%
 
86201914690.1%
 

internet
Categorical

MISSING

Distinct4
Distinct (%)< 0.1%
Missing179331
Missing (%)17.9%
Memory size976.8 KiB
no
384908 
niu (not in universe)
271723 
yes
159985 
unknown
 
4053
ValueCountFrequency (%) 
no38490838.5%
 
niu (not in universe)27172327.2%
 
yes15998516.0%
 
unknown40530.4%
 
(Missing)17933117.9%
 
2020-11-17T10:42:04.611461image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
2020-11-17T10:42:04.679449image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-17T10:42:04.766967image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length21
Median length3
Mean length7.522318
Min length2

computer
Categorical

Distinct4
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size976.8 KiB
no
735566 
yes
256429 
niu (not in universe)
 
4519
unknown/missing
 
3486
ValueCountFrequency (%) 
no73556673.6%
 
yes25642925.6%
 
niu (not in universe)45190.5%
 
unknown/missing34860.3%
 
2020-11-17T10:42:04.876434image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
2020-11-17T10:42:04.947064image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-17T10:42:05.035137image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length21
Median length2
Mean length2.387608
Min length2

pernum
Real number (ℝ≥0)

Distinct36
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.829162
Minimum1
Maximum46
Zeros0
Zeros (%)0.0%
Memory size976.6 KiB
2020-11-17T10:42:05.154523image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q11
median2
Q34
95-th percentile6
Maximum46
Range45
Interquartile range (IQR)3

Descriptive statistics

Standard deviation1.879112532
Coefficient of variation (CV)0.6641940378
Kurtosis7.18633033
Mean2.829162
Median Absolute Deviation (MAD)1
Skewness1.795066969
Sum2829162
Variance3.531063909
MonotocityNot monotonic
2020-11-17T10:42:05.269768image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=36)
ValueCountFrequency (%) 
127823727.8%
 
224329724.3%
 
319085719.1%
 
413098213.1%
 
5741457.4%
 
6386253.9%
 
7201232.0%
 
8103161.0%
 
956800.6%
 
1031980.3%
 
Other values (26)45400.5%
 
ValueCountFrequency (%) 
127823727.8%
 
224329724.3%
 
319085719.1%
 
413098213.1%
 
5741457.4%
 
ValueCountFrequency (%) 
461< 0.1%
 
391< 0.1%
 
371< 0.1%
 
351< 0.1%
 
322< 0.1%
 

perwt
Real number (ℝ≥0)

HIGH CORRELATION

Distinct6174
Distinct (%)0.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean9.83114623
Minimum0.77
Maximum490
Zeros0
Zeros (%)0.0%
Memory size7.6 MiB
2020-11-17T10:42:05.381417image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0.77
5-th percentile2
Q14.65
median10
Q310
95-th percentile22.39
Maximum490
Range489.23
Interquartile range (IQR)5.35

Descriptive statistics

Standard deviation9.379103761
Coefficient of variation (CV)0.9540193525
Kurtosis123.7429894
Mean9.83114623
Median Absolute Deviation (MAD)2.54
Skewness7.292926847
Sum9831146.23
Variance87.96758735
MonotocityNot monotonic
2020-11-17T10:42:05.508964image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
1032771532.8%
 
2537665.4%
 
4516675.2%
 
6317523.2%
 
4.64287242.9%
 
8189091.9%
 
1277500.8%
 
1456810.6%
 
1643110.4%
 
1834150.3%
 
Other values (6164)46631046.6%
 
ValueCountFrequency (%) 
0.776< 0.1%
 
0.831< 0.1%
 
0.8411< 0.1%
 
0.857< 0.1%
 
0.861< 0.1%
 
ValueCountFrequency (%) 
4903< 0.1%
 
4781< 0.1%
 
4103< 0.1%
 
3941< 0.1%
 
3762< 0.1%
 

age
Real number (ℝ≥0)

MISSING

Distinct97
Distinct (%)< 0.1%
Missing52174
Missing (%)5.2%
Infinite0
Infinite (%)0.0%
Mean31.67749249
Minimum3
Maximum99
Zeros0
Zeros (%)0.0%
Memory size7.6 MiB
2020-11-17T10:42:05.639421image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum3
5-th percentile5
Q115
median28
Q345
95-th percentile70
Maximum99
Range96
Interquartile range (IQR)30

Descriptive statistics

Standard deviation20.2215345
Coefficient of variation (CV)0.63835654
Kurtosis-0.3792163021
Mean31.67749249
Median Absolute Deviation (MAD)15
Skewness0.6225786121
Sum30024751
Variance408.9104577
MonotocityNot monotonic
2020-11-17T10:42:05.762395image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
10205242.1%
 
14201282.0%
 
12200622.0%
 
15200072.0%
 
13197012.0%
 
11194211.9%
 
9192431.9%
 
16189801.9%
 
18189721.9%
 
8189371.9%
 
Other values (87)75185175.2%
 
(Missing)521745.2%
 
ValueCountFrequency (%) 
3181871.8%
 
4182311.8%
 
5184681.8%
 
6182501.8%
 
7186941.9%
 
ValueCountFrequency (%) 
9957< 0.1%
 
98120< 0.1%
 
97106< 0.1%
 
96152< 0.1%
 
95203< 0.1%
 

sex
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size976.7 KiB
female
507829 
male
492171 
ValueCountFrequency (%) 
female50782950.8%
 
male49217149.2%
 
2020-11-17T10:42:05.883445image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
2020-11-17T10:42:05.951878image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-17T10:42:06.032870image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length6
Median length6
Mean length5.015658
Min length4

race
Categorical

MISSING

Distinct13
Distinct (%)< 0.1%
Missing478178
Missing (%)47.8%
Memory size977.3 KiB
white
261429 
brown (brazil)
173620 
black
36413 
mestizo (indigenous and white)
29488 
indigenous
 
9068
Other values (8)
 
11804
ValueCountFrequency (%) 
white26142926.1%
 
brown (brazil)17362017.4%
 
black364133.6%
 
mestizo (indigenous and white)294882.9%
 
indigenous90680.9%
 
asian40040.4%
 
unknown30780.3%
 
montubio (ecuador)20450.2%
 
afro-ecuadorian11460.1%
 
mulatto (black and white)10860.1%
 
Other values (3)445< 0.1%
 
(Missing)47817847.8%
 
2020-11-17T10:42:06.139739image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
2020-11-17T10:42:06.237758image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length30
Median length5
Mean length6.455739
Min length3

indig
Categorical

MISSING

Distinct3
Distinct (%)< 0.1%
Missing216989
Missing (%)21.7%
Memory size976.7 KiB
no
689348 
yes
88443 
unknown
 
5220
ValueCountFrequency (%) 
no68934868.9%
 
yes884438.8%
 
unknown52200.5%
 
(Missing)21698921.7%
 
2020-11-17T10:42:06.357891image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
2020-11-17T10:42:06.427199image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-17T10:42:06.498263image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length7
Median length2
Mean length2.331532
Min length2

lit
Categorical

Distinct4
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size976.8 KiB
yes, literate
796402 
no, illiterate
120876 
niu (not in universe)
 
78998
unknown/missing
 
3724
ValueCountFrequency (%) 
yes, literate79640279.6%
 
no, illiterate12087612.1%
 
niu (not in universe)789987.9%
 
unknown/missing37240.4%
 
2020-11-17T10:42:06.594846image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
2020-11-17T10:42:06.665956image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-17T10:42:06.755111image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length21
Median length13
Mean length13.760308
Min length13

edattain
Categorical

HIGH CORRELATION

Distinct6
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size976.8 KiB
less than primary completed
429963 
primary completed
313201 
secondary completed
167529 
university completed
47850 
niu (not in universe)
 
36541
ValueCountFrequency (%) 
less than primary completed42996343.0%
 
primary completed31320131.3%
 
secondary completed16752916.8%
 
university completed478504.8%
 
niu (not in universe)365413.7%
 
unknown49160.5%
 
2020-11-17T10:42:06.854376image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
2020-11-17T10:42:06.916187image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-17T10:42:07.030487image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length27
Median length20
Mean length21.875242
Min length7

edattaind
Categorical

HIGH CORRELATION

Distinct14
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size977.3 KiB
some primary completed
225020 
primary (6 yrs) completed
182641 
no schooling
159289 
lower secondary general completed
111377 
secondary, general track completed
108353 
Other values (9)
213320 
ValueCountFrequency (%) 
some primary completed22502022.5%
 
primary (6 yrs) completed18264118.3%
 
no schooling15928915.9%
 
lower secondary general completed11137711.1%
 
secondary, general track completed10835310.8%
 
university completed478504.8%
 
primary (4 yrs) completed456544.6%
 
niu (not in universe)365413.7%
 
some college completed347333.5%
 
post-secondary technical education202802.0%
 
Other values (4)282622.8%
 
2020-11-17T10:42:07.156289image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
2020-11-17T10:42:07.255872image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length36
Median length22
Mean length23.834486
Min length12

empstat
Categorical

HIGH CORRELATION

Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size976.8 KiB
inactive
397926 
employed
376517 
niu (not in universe)
180669 
unemployed
40352 
unknown/missing
 
4536
ValueCountFrequency (%) 
inactive39792639.8%
 
employed37651737.7%
 
niu (not in universe)18066918.1%
 
unemployed403524.0%
 
unknown/missing45360.5%
 
2020-11-17T10:42:07.376042image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
2020-11-17T10:42:07.465483image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-17T10:42:07.563766image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length21
Median length8
Mean length10.461153
Min length8

empstatd
Categorical

HIGH CORRELATION

Distinct26
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size978.0 KiB
at work
349999 
niu (not in universe)
180669 
inactive (not in labor force)
167415 
housework
97386 
in school
83505 
Other values (21)
121026 
ValueCountFrequency (%) 
at work34999935.0%
 
niu (not in universe)18066918.1%
 
inactive (not in labor force)16741516.7%
 
housework973869.7%
 
in school835058.4%
 
inactive, other reasons327953.3%
 
unemployed, not specified313083.1%
 
have job, not at work in reference period101441.0%
 
employed, not specified86450.9%
 
retirees and living on rent56300.6%
 
Other values (16)325043.3%
 
2020-11-17T10:42:07.683709image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
2020-11-17T10:42:07.796342image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length42
Median length9
Mean length15.759659
Min length7

labforce
Categorical

Distinct4
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size976.8 KiB
yes, in the labor force
408514 
no, not in the labor force
306144 
niu (not in universe)
281721 
unknown
 
3621
ValueCountFrequency (%) 
yes, in the labor force40851440.9%
 
no, not in the labor force30614430.6%
 
niu (not in universe)28172128.2%
 
unknown36210.4%
 
2020-11-17T10:42:07.905127image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
2020-11-17T10:42:07.976448image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-17T10:42:08.066121image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length26
Median length23
Mean length23.297054
Min length7

Interactions

2020-11-17T10:41:35.111363image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-17T10:41:35.366057image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-17T10:41:35.618436image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-17T10:41:35.866321image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-17T10:41:36.109888image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-17T10:41:36.356740image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-17T10:41:36.598125image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-17T10:41:36.853723image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-17T10:41:37.108555image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-17T10:41:37.376396image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-17T10:41:38.181885image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-17T10:41:38.461271image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-17T10:41:38.741049image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-17T10:41:39.047659image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-17T10:41:39.293490image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-17T10:41:39.563893image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-17T10:41:39.888820image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-17T10:41:40.135318image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-17T10:41:40.384718image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-17T10:41:40.627314image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-17T10:41:40.882337image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-17T10:41:41.180162image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-17T10:41:41.462385image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-17T10:41:41.766218image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-17T10:41:42.071732image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-17T10:41:42.321043image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-17T10:41:42.626287image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-17T10:41:42.869113image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-17T10:41:43.134923image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-17T10:41:43.434676image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-17T10:41:43.715226image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-17T10:41:43.978770image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-17T10:41:44.281929image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-17T10:41:44.542544image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-17T10:41:44.806863image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-17T10:41:45.089967image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-17T10:41:45.333073image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-17T10:41:45.585156image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-17T10:41:45.825386image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-17T10:41:46.180815image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-17T10:41:46.445127image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-17T10:41:46.788870image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-17T10:41:47.062865image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-17T10:41:47.290339image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-17T10:41:47.527147image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-17T10:41:47.775621image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-17T10:41:48.013471image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-17T10:41:48.271554image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-17T10:41:48.525243image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-17T10:41:48.789203image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-17T10:41:49.055182image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-17T10:41:49.464887image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-17T10:41:49.712387image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-17T10:41:49.980944image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-17T10:41:50.224405image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-17T10:41:50.496434image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-17T10:41:50.771601image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-17T10:41:51.027062image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-17T10:41:51.279112image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-17T10:41:51.519601image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-17T10:41:51.758489image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-17T10:41:52.002408image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-17T10:41:52.248122image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-17T10:41:52.502005image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Correlations

2020-11-17T10:42:08.156315image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2020-11-17T10:42:08.328842image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2020-11-17T10:42:08.835306image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2020-11-17T10:42:09.010892image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.
2020-11-17T10:42:09.305022image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

2020-11-17T10:41:53.779634image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-17T10:41:55.639008image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-17T10:41:59.706565image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-17T10:42:00.387535image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Sample

First rows

df_indexcountryyearsampleserialpersonshhwtgqgeolev1internetcomputerpernumperwtagesexraceindiglitedattainedattaindempstatempstatdlabforce
026329292colombia2005colombia 20055.881000e+0756.16households170005NaNno56.1611.0malewhitenoyes, literateprimary completedprimary (5 yrs) completedinactivein schoolniu (not in universe)
145863653nicaragua2005nicaragua 20052.634000e+07610.00households558025nono110.0053.0femaleNaNnoyes, literateless than primary completedsome primary completedemployedat workyes, in the labor force
251297159venezuela2001venezuela 20012.886910e+08710.00households862013nono110.0052.0femaleNaNNaNno, illiterateless than primary completedno schoolingunknown/missingunknown/missingunknown
32009550argentina2010argentina 20106.480550e+08510.00households32014NaNno310.0019.0maleNaNNaNyes, literateprimary completedprimary (6 yrs) completedemployedat workyes, in the labor force
417220209brazil2010brazil 20103.834011e+0953.26households76035yesyes43.2617.0femalewhitenoyes, literatesecondary completedsecondary, general track completedinactiveinactive (not in labor force)no, not in the labor force
545452679mexico2015mexico 20152.853132e+0954.00households484031nono44.0026.0femaleNaNyesyes, literateuniversity completeduniversity completedemployedat workyes, in the labor force
64182065brazil2010brazil 20106.362100e+0775.63households76012niu (not in universe)no55.6322.0malewhitenoyes, literateprimary completedprimary (6 yrs) completedinactiveinactive (not in labor force)no, not in the labor force
7761071argentina2010argentina 20102.650540e+08210.00households32006NaNno210.0050.0maleNaNNaNyes, literateprimary completedprimary (6 yrs) completedemployedat workyes, in the labor force
831054217dominican republic2010dominican republic 20101.624880e+08610.00households214008nono610.0010.0maleNaNNaNyes, literateless than primary completedsome primary completedinactivein schoolniu (not in universe)
936215848mexico2015mexico 20154.576560e+08828.00households484009yesyes828.0050.0femaleNaNnoyes, literatesecondary completedsecondary, general track completedemployedat workyes, in the labor force

Last rows

df_indexcountryyearsampleserialpersonshhwtgqgeolev1internetcomputerpernumperwtagesexraceindiglitedattainedattaindempstatempstatdlabforce
99999034166301haiti2003haiti 20031.636770e+08810.00households332006NaNno410.0012.0femaleNaNNaNno, illiterateless than primary completedno schoolinginactivehouseworkniu (not in universe)
9999916186480brazil2010brazil 20105.580270e+08610.39households76021niu (not in universe)no210.397.0malewhitenoyes, literateless than primary completedsome primary completedniu (not in universe)niu (not in universe)niu (not in universe)
99999230320964costa rica2011costa rica 20115.718100e+07510.00households188002nono110.0038.0malewhitenoyes, literateless than primary completedsome primary completedemployedat workyes, in the labor force
99999341488326mexico2015mexico 20151.817698e+0912.00households484020nono12.0057.0femaleNaNyesyes, literateless than primary completedsome primary completedinactivehouseworkno, not in the labor force
99999416790221brazil2010brazil 20103.696445e+09102.48households76035niu (not in universe)no12.4839.0malewhitenono, illiterateless than primary completedsome primary completedemployedat workyes, in the labor force
99999515875195brazil2010brazil 20103.403780e+09415.96households76033niu (not in universe)no315.9626.0malebrown (brazil)noyes, literateprimary completedlower secondary general completedemployedat workyes, in the labor force
99999622126704brazil2010brazil 20105.387448e+09511.96households76043noyes311.96NaNfemaleblacknoniu (not in universe)less than primary completedno schoolingniu (not in universe)niu (not in universe)niu (not in universe)
99999725764931chile2002chile 20023.630960e+08410.00households152131nono310.009.0femaleNaNnono, illiterateless than primary completedsome primary completedniu (not in universe)niu (not in universe)niu (not in universe)
99999849665403el salvador2007el salvador 20079.858900e+07510.00households222006nono510.004.0malemestizo (indigenous and white)noniu (not in universe)niu (not in universe)niu (not in universe)employedmarginally employedniu (not in universe)
9999996642712brazil2010brazil 20106.748320e+08511.97households76022niu (not in universe)no211.9730.0femalebrown (brazil)noyes, literatesecondary completedsecondary, general track completedemployedat workyes, in the labor force