How to make nominal of a quantitative variable

Question

Suppose that there is an array of observations in which one of the variables takes random values from 1 to 100. How to make an ordinal variable from the latter that would take values depending on the specified thresholds (for example: "1" if <50; "2" if [50,60]; otherwise, "3")? I wanted to use a map or lambda function, but failed: C

Accepted Answer · 2018-02-28T14:29:53

In [123]: lst Out[123]: [87, 92, 22, 1, 94, 18, 92, 44, 77, 73, 53, 24, 9, 67, 20] In [142]: res = ["1" if x < 50 else "2" if x <= 60 else "3" for x in lst] In [143]: res Out[143]: ['3', '3', '1', '1', '3', '1', '3', '1', '3', '3', '2', '1', '1', '3', '1']

For large amounts of data it is better to use Numpy or Pandas - they work much faster:

 import pandas as pd import numpy as np # для генерации случайных чисел

Sample input data:

 In [166]: df = pd.DataFrame({'var':np.random.randint(1, 101, 10)}) In [167]: df Out[167]: var 0 38 1 100 2 74 3 5 4 66 5 32 6 91 7 6 8 68 9 50

use pd.cut () :

 In [168]: df['tag1'] = pd.cut(df['var'], bins=[0,50,60,101], labels=[1,2,3]) In [169]: df['tag2'] = pd.cut(df['var'], bins=[0,50,60,101])

Result: - if you do not specify values for the parameter labels pd.cut () itself constructs value ranges - this may come in handy:

 In [170]: df Out[170]: var tag1 tag2 0 38 1 (0, 50] 1 100 3 (60, 101] 2 74 3 (60, 101] 3 5 1 (0, 50] 4 66 3 (60, 101] 5 32 1 (0, 50] 6 91 3 (60, 101] 7 6 1 (0, 50] 8 68 3 (60, 101] 9 50 1 (0, 50]

You can also include the left boundaries of the ranges instead of the right:

 In [172]: df['tag3'] = pd.cut(df['var'], bins=[0,50,60,101], right=False) In [173]: df Out[173]: var tag1 tag2 tag3 0 38 1 (0, 50] [0, 50) 1 100 3 (60, 101] [60, 101) 2 74 3 (60, 101] [60, 101) 3 5 1 (0, 50] [0, 50) 4 66 3 (60, 101] [60, 101) 5 32 1 (0, 50] [0, 50) 6 91 3 (60, 101] [60, 101) 7 6 1 (0, 50] [0, 50) 8 68 3 (60, 101] [60, 101) 9 50 1 (0, 50] [50, 60)

Check 50 <= redundant, will work without it too. - insolor

jfs jfs 44.5k eight 53 199 · Answer 2 · 2018-02-28T17:06:06

Just there is the numpy.digitize() function , which returns the numbers of the ranges to which the array elements belong:

 >>> import numpy as np >>> a = np.random.randint(1, 101, size=10) >>> a array([16, 42, 19, 88, 69, 15, 5, 1, 33, 50]) >>> np.digitize(a, [1, 50, 60, 101]) array([1, 1, 1, 3, 3, 1, 1, 1, 1, 2])

1 <= 16 < 50 so the range number for 16 is 1
50 <= 50 < 60 so the range number for 50 is 2
60 <= 88 < 101 so the range number for 88 is 3

@MaxU: I think the author is simply not familiar with the notation: [50, 60) For example, there is no mention in the question of "from 1 to 100" inclusive or not.
Therefore, I clearly indicated that there were no discrepancies.

How to make nominal of a quantitative variable

2 answers 2

More articles: