Python/pandas data code analysis

Import pandas as pd

df = pd.DataFrame({'key1': list('aabba'),

'key2': ['one', 'two', 'one', 'two', 'one'],

'data1': np.random.randn(5),

'data2': np.random.randn(5)})

Python/pandas data mining (fourteen) - groupby, aggregation, group-level operations

grouped = df['data1'].groupby(df['key1'])

grouped.mean()

The grouping keys used above are Series. However, the grouping key can be any array of appropriate length.

states = np.array(['Ohio', 'California', 'California', 'Ohio', 'Ohio'])

years = np.array([2005, 2005, 2006, 2005, 2006])

df['data1'].groupby([states, years]).mean()

Python/pandas data mining (fourteen) - groupby, aggregation, group-level operations

df.groupby('key1').mean()

Python/pandas data mining (fourteen) - groupby, aggregation, group-level operations

It is noticeable that there is no key2 column in the result because df['key2'] contains non-numeric data and thus gets excluded. By default, all numeric columns are aggregated, although sometimes you may want to filter them into a subset.

Iterating over the groups:

for name, group in df.groupby('key1'):

print(name)

print(group)

Python/pandas data mining (fourteen) - groupby, aggregation, group-level operations

You can see that 'name' represents the value of 'key1' in the group, while 'group' contains the corresponding rows.

Similarly:

for (k1, k2), group in df.groupby(['key1', 'key2']):

print('===k1,k2:')

print(k1, k2)

print('===k3:')

print(group)

Python/pandas data mining (fourteen) - groupby, aggregation, group-level operations

You can perform operations on the grouped data, such as converting it into a dictionary.

piece = dict(list(df.groupby('key1')))

piece

{'a': data1 data2 key1 key2

0 -0.233405 -0.756316 a one

1 -0.232103 -0.095894 a two

4 1.056224 0.736629 a one, 'b': data1 data2 key1 key2

2 0.200875 0.598282 b one

3 -1.437782 0.107547 b two}

piece['a']

Python/pandas data mining (fourteen) - groupby, aggregation, group-level operations

By default, groupby groups along axis=0. You can also group along other axes by specifying it.

grouped = df.groupby(df.dtypes, axis=1)

dict(list(grouped))

{dtype('float64'): data1 data2

0 -0.233405 -0.756316

1 -0.232103 -0.095894

2 0.200875 0.598282

3 -1.437782 0.107547

4 1.056224 0.736629, dtype('O'): key1 key2

0 a one

1 a two

2 b one

3 b two

4 a one}

Selecting one or multiple columns:

Python/pandas data mining (fourteen) - groupby, aggregation, group-level operations

For large datasets, often only some columns need to be aggregated.

df.groupby(['key1','key2'])[['data2']].mean()

Group by dictionary or series:

people = pd.DataFrame(np.random.randn(5,5),

columns=list('abcde'),

index=['Joe','Steve', 'Wes', 'Jim', 'Travis'])

people.ix[2:3,['b','c']] = np.nan #Set a few nan

people

Python/pandas data mining (fourteen) - groupby, aggregation, group-level operations

Known grouping relationship of columns:

mapping = {'a':'red', 'b': 'red', 'c': 'blue', 'd': 'blue', 'e': 'red', 'f': 'orange'}

by_column = people.groupby(mapping, axis=1)

by_column.sum()

Python/pandas data mining (fourteen) - groupby, aggregation, group-level operations

If you don't add axis=1, only columns 'a', 'b', 'c', 'd', 'e' will appear.

The same applies to Series.

map_series = pd.Series(mapping)

map_series

a red

b red

c blue

d blue

e red

f orange

Dtype: object

people.groupby(map_series,axis=1).count()

Python/pandas data mining (fourteen) - groupby, aggregation, group-level operations

Group by function:

Compared to dictionaries or Series, using Python functions to define grouping relationships can be more creative. Any function that acts as a grouping key will be called once for each index, and its return value will be used as the group name. For example, if you want to group by the length of a person's name, you can pass in len.

people.groupby(len).sum()

abcde

3 -1.308709 -2.353354 1.585584 2.908360 -1.267162

5 -0.688506 -0.187575 -0.048742 1.491272 -0.636704

6 0.110028 -0.932493 1.343791 -1.928363 -0.364745

Mixing functions with arrays, lists, dictionaries, and Series is not an issue since everything eventually gets converted to an array.

key_list = ['one','one','one','two','two']

people.groupby([len,key_list]).sum()

Group by index level:

The most convenient aspect of a hierarchical index is that it allows aggregation based on the index level. To do this, you can specify the level number or name via the 'level' keyword:

columns = pd.MultiIndex.from_arrays([['US','US','US','JP','JP'], [1,3,5,1,3]], names=['cty', 'tenor'])

hier_df = pd.DataFrame(np.random.randn(4,5), columns=columns)

hier_df

Python/pandas data mining (fourteen) - groupby, aggregation, group-level operations

hier_df.groupby(level='cty', axis=1).count()

Python/pandas data mining (fourteen) - groupby, aggregation, group-level operations

Data aggregation:

Calling a custom aggregate function:

Python/pandas data mining (fourteen) - groupby, aggregation, group-level operations

Column-oriented multi-function application:

Aggregate operations on Series or DataFrame columns typically use aggregate or call mean, std, etc. Next, we want to apply different aggregate functions to different columns, or apply multiple functions at once.

grouped = Tips.groupby(['sex','smoker'])

grouped_pct = grouped['Tip_pct'] # Tip_pct

grouped_pct.agg('mean') # For the statistics described in the 9-1 icon, you can pass the function name directly as a string.

# If you pass in a set of functions, the column name of the obtained df will be named 12345 with the corresponding function.

Python/pandas data mining (fourteen) - groupby, aggregation, group-level operations

The automatically given column name is low. If a list of (name, function) tuples is passed in, the first element of each tuple will be used as the column name of df.

Python/pandas data mining (fourteen) - groupby, aggregation, group-level operations

For df, you can define a set of functions for all columns, or apply different functions in different columns.

Python/pandas data mining (fourteen) - groupby, aggregation, group-level operations

If you want to apply different functions to different columns, the specific way is to agg a dictionary that maps from column names to functions.

Python/pandas data mining (fourteen) - groupby, aggregation, group-level operations

Df can have hierarchical columns only when applying multiple functions to at least one column.

Group-level operations and transformations:

Aggregation is just one type of grouping operation, which is a special form of data transformation. Transform and apply are more versatile.

Transform will apply a function to each group and then place the results in the appropriate location. If each group produces a scalar value, the scalar value will be broadcast.

Transform is also a special function with strict conditions: the passed function can only produce two kinds of results, either a scalar value that can be broadcast (e.g., np.mean), or an array of the same size as the group.

People = pd.DataFrame(np.random.randn(5,5),

Columns=list('abcde'),

Index=['Joe','Steve', 'Wes', 'Jim', 'Travis'])

People

12345

Python/pandas data mining (fourteen) - groupby, aggregation, group-level operations

Key = ['one','two','one','two','one']

People.groupby(key).mean()

Python/pandas data mining (fourteen) - groupby, aggregation, group-level operations

People.groupby(key).transform(np.mean)

Python/pandas data mining (fourteen) - groupby, aggregation, group-level operations

You can see that there are many values as in Table 2.

def demean(arr):

return arr - arr.mean()

demeaned = people.groupby(key).transform(demean)

demeaned

demeaned.groupby(key).mean()

The most general groupby method is apply.

Tips = pd.read_csv('C:\\Users\\ecaoyng\\Desktop\\work space\\Python\\py_for_analysis_code\\pydata-book-master\\ch08\ips.csv')

Tips[:5]

Python/pandas data mining (fourteen) - groupby, aggregation, group-level operations

New generation of a column:

Tips['tip_pct'] = Tips['tip']/Tips['total_bill']

Tips[:6]

Python/pandas data mining (fourteen) - groupby, aggregation, group-level operations

Select the top 5 tip_pct values based on the grouping:

def top(df,n=5,column='tip_pct'):

return df.sort_index(by=column)[-n:]

top(tips,n=6)

Python/pandas data mining (fourteen) - groupby, aggregation, group-level operations

Group the smoker and apply the function:

Tips.groupby('smoker').apply(top)

Python/pandas data mining (fourteen) - groupby, aggregation, group-level operations

Multi-parameter version:

Tips.groupby(['smoker','day']).apply(top,n=1,column='total_bill')

Python/pandas data mining (fourteen) - groupby, aggregation, group-level operations

Quantile and bucket analysis:

Cut and qcut combined with groupby makes it easy to analyze the bucket or quantile of the dataset.

Frame = pd.DataFrame({'data1':np.random.randn(1000),

'data2': np.random.randn(1000)})

Frame[:5]

Python/pandas data mining (fourteen) - groupby, aggregation, group-level operations

factor = pd.cut(frame.data1,4)

factor[:10]

0 (0.281, 2.00374]

1 (0.281, 2.00374]

2 (-3.172, -1.442)

3 (-1.442, 0.281)

4 (0.281, 2.00374]

5 (0.281, 2.00374]

6 (-1.442, 0.281)

7 (-1.442, 0.281)

8 (-1.442, 0.281)

9 (-1.442, 0.281)

Name: data1, dtype: category

Categories (4, object): [(-3.172, -1.442] " (-1.442, 0.281) " (0.281, 2.00374] " (2.00374, 3.727]]

def get_stats(group):

return {'min':group.min(), 'max':group.max(), 'count':group.count(), 'mean':group.mean()}

grouped = frame.data2.groupby(factor)

grouped.apply(get_stats).unstack()

Python/pandas data mining (fourteen) - groupby, aggregation, group-level operations

These are buckets of equal length. To get equal-sized buckets based on the number of samples, use qcut.

Equal-length buckets: equal intervals

Equal-sized buckets: equal number of data points

grouping = pd.qcut(frame.data1,10, labels=False) # label=false can get the quantile number

grouped = frame.data2.groupby(grouping)

grouped.apply(get_stats).unstack()

Python/pandas data mining (fourteen) - groupby, aggregation, group-level operations

Explosion Proof Quick Plug Connector

BX3 series of explosion-proof socket connector is applicable to the petroleum, machinery, chemical, mining, drilling, wharf, construction, can provide the power for the electric motor, electric welding machine, pump, compressor, outdoor lighting equipment.

Product design,compact structure, easy installation, high temperature, shock, anti-aging, has good insulating properties and mechanical properties, products built with trapezoidal hole rubber seal and clamp device can be used for different cable diameters, not afraid of the rain, in oil exploration, drilling, well field standardized electrical lines have been widely applied. Products meet the national standard GB3836-201 "electrical apparatus fo explosive gas atmosphere", after the inspection of the national explosion-proof electrical quality inspection, certificate of proof, explosion-proof marks CNEx19.1354X, EXnA II CT4Gc.

The product consists of two parts, plugs and sockets, which plug is removable (YT), there are three forms of socket options: 1,fixed (GZ), 2, straight-through (YZ), 3, hanging (GYZ). Connection plug and socket bayonet quick connect, contact with the wire ends to tighten the screws. Shell is made of PA66 insulation corrosion material, silver-plated contacts, access to reliable, easy wiring, sealing and insulation, waterproof, dustproof, shockproof and so on.

Main Technical Data
Operating temperature: -20 â„ƒ 60 â„ƒ
Rated voltage: 0 - 500V Frequency: 50Hz - 60Hz
Rated current: 1OA/16A/25A/32A/40A/60A/63A/1OOA/125A/150A/200A/250A/300A/400A/500A/600A/630A
Withstand voltage: 1800VAC
Insulation resistance: > 100MO
Contact resistance :lower than 0.5 MO
Temperature rise: lower than 65K
Mechanical life: 500 times
Contact Poles: 3P/3P+E/3P+N+E
Housing protection: IP67

Recommend to use cables with suitabe specifications:
recomend cable sizes

Product Selection Parameters:

Catalog Explosion proof Plug socket-9

Explosion Proof Plug Socket Connectors

Ningbo Bond Industrial Electric Co., Ltd. , https://www.bondelectro.com