Python/pandas data code analysis

    Import pandas as pd

    df = pd.DataFrame({'key1': list('aabba'),

    'key2': ['one', 'two', 'one', 'two', 'one'],

    'data1': np.random.randn(5),

    'data2': np.random.randn(5)})

    df

    Python/pandas data mining (fourteen) - groupby, aggregation, group-level operations

    grouped = df['data1'].groupby(df['key1'])

    grouped.mean()

    The grouping keys used above are Series. However, the grouping key can be any array of appropriate length.

    states = np.array(['Ohio', 'California', 'California', 'Ohio', 'Ohio'])

    years = np.array([2005, 2005, 2006, 2005, 2006])

    df['data1'].groupby([states, years]).mean()

    Python/pandas data mining (fourteen) - groupby, aggregation, group-level operations

    df.groupby('key1').mean()

    Python/pandas data mining (fourteen) - groupby, aggregation, group-level operations

    It is noticeable that there is no key2 column in the result because df['key2'] contains non-numeric data and thus gets excluded. By default, all numeric columns are aggregated, although sometimes you may want to filter them into a subset.

    Iterating over the groups:

    for name, group in df.groupby('key1'):

    print(name)

    print(group)

    Python/pandas data mining (fourteen) - groupby, aggregation, group-level operations

    You can see that 'name' represents the value of 'key1' in the group, while 'group' contains the corresponding rows.

    Similarly:

    for (k1, k2), group in df.groupby(['key1', 'key2']):

    print('===k1,k2:')

    print(k1, k2)

    print('===k3:')

    print(group)

    Python/pandas data mining (fourteen) - groupby, aggregation, group-level operations

    You can perform operations on the grouped data, such as converting it into a dictionary.

    piece = dict(list(df.groupby('key1')))

    piece

    {'a': data1 data2 key1 key2

    0 -0.233405 -0.756316 a one

    1 -0.232103 -0.095894 a two

    4 1.056224 0.736629 a one, 'b': data1 data2 key1 key2

    2 0.200875 0.598282 b one

    3 -1.437782 0.107547 b two}

    piece['a']

    Python/pandas data mining (fourteen) - groupby, aggregation, group-level operations

    By default, groupby groups along axis=0. You can also group along other axes by specifying it.

    grouped = df.groupby(df.dtypes, axis=1)

    dict(list(grouped))

    {dtype('float64'): data1 data2

    0 -0.233405 -0.756316

    1 -0.232103 -0.095894

    2 0.200875 0.598282

    3 -1.437782 0.107547

    4 1.056224 0.736629, dtype('O'): key1 key2

    0 a one

    1 a two

    2 b one

    3 b two

    4 a one}

    Selecting one or multiple columns:

    Python/pandas data mining (fourteen) - groupby, aggregation, group-level operations

    For large datasets, often only some columns need to be aggregated.

    df.groupby(['key1','key2'])[['data2']].mean()

    Group by dictionary or series:

    people = pd.DataFrame(np.random.randn(5,5),

    columns=list('abcde'),

    index=['Joe','Steve', 'Wes', 'Jim', 'Travis'])

    people.ix[2:3,['b','c']] = np.nan #Set a few nan

    people

    Python/pandas data mining (fourteen) - groupby, aggregation, group-level operations

    Known grouping relationship of columns:

    mapping = {'a':'red', 'b': 'red', 'c': 'blue', 'd': 'blue', 'e': 'red', 'f': 'orange'}

    by_column = people.groupby(mapping, axis=1)

    by_column.sum()

    Python/pandas data mining (fourteen) - groupby, aggregation, group-level operations

    If you don't add axis=1, only columns 'a', 'b', 'c', 'd', 'e' will appear.

    The same applies to Series.

    map_series = pd.Series(mapping)

    map_series

    a red

    b red

    c blue

    d blue

    e red

    f orange

    Dtype: object

    people.groupby(map_series,axis=1).count()

    Python/pandas data mining (fourteen) - groupby, aggregation, group-level operations

    Group by function:

    Compared to dictionaries or Series, using Python functions to define grouping relationships can be more creative. Any function that acts as a grouping key will be called once for each index, and its return value will be used as the group name. For example, if you want to group by the length of a person's name, you can pass in len.

    people.groupby(len).sum()

    abcde

    3 -1.308709 -2.353354 1.585584 2.908360 -1.267162

    5 -0.688506 -0.187575 -0.048742 1.491272 -0.636704

    6 0.110028 -0.932493 1.343791 -1.928363 -0.364745

    Mixing functions with arrays, lists, dictionaries, and Series is not an issue since everything eventually gets converted to an array.

    key_list = ['one','one','one','two','two']

    people.groupby([len,key_list]).sum()

    Group by index level:

    The most convenient aspect of a hierarchical index is that it allows aggregation based on the index level. To do this, you can specify the level number or name via the 'level' keyword:

    columns = pd.MultiIndex.from_arrays([['US','US','US','JP','JP'], [1,3,5,1,3]], names=['cty', 'tenor'])

    hier_df = pd.DataFrame(np.random.randn(4,5), columns=columns)

    hier_df

    Python/pandas data mining (fourteen) - groupby, aggregation, group-level operations

    hier_df.groupby(level='cty', axis=1).count()

    Python/pandas data mining (fourteen) - groupby, aggregation, group-level operations

    Data aggregation:

    Calling a custom aggregate function:

    Python/pandas data mining (fourteen) - groupby, aggregation, group-level operations

    Python/pandas data mining (fourteen) - groupby, aggregation, group-level operations

    Column-oriented multi-function application:

    Aggregate operations on Series or DataFrame columns typically use aggregate or call mean, std, etc. Next, we want to apply different aggregate functions to different columns, or apply multiple functions at once.

    grouped = Tips.groupby(['sex','smoker'])

    grouped_pct = grouped['Tip_pct'] # Tip_pct

    grouped_pct.agg('mean') # For the statistics described in the 9-1 icon, you can pass the function name directly as a string.

    # If you pass in a set of functions, the column name of the obtained df will be named 12345 with the corresponding function.

    Python/pandas data mining (fourteen) - groupby, aggregation, group-level operations

    The automatically given column name is low. If a list of (name, function) tuples is passed in, the first element of each tuple will be used as the column name of df.

    Python/pandas data mining (fourteen) - groupby, aggregation, group-level operations

    For df, you can define a set of functions for all columns, or apply different functions in different columns.

    Python/pandas data mining (fourteen) - groupby, aggregation, group-level operations

    If you want to apply different functions to different columns, the specific way is to agg a dictionary that maps from column names to functions.

    Python/pandas data mining (fourteen) - groupby, aggregation, group-level operations

    Df can have hierarchical columns only when applying multiple functions to at least one column.

    Group-level operations and transformations:

    Aggregation is just one type of grouping operation, which is a special form of data transformation. Transform and apply are more versatile.

    Transform will apply a function to each group and then place the results in the appropriate location. If each group produces a scalar value, the scalar value will be broadcast.

    Transform is also a special function with strict conditions: the passed function can only produce two kinds of results, either a scalar value that can be broadcast (e.g., np.mean), or an array of the same size as the group.

    People = pd.DataFrame(np.random.randn(5,5),

    Columns=list('abcde'),

    Index=['Joe','Steve', 'Wes', 'Jim', 'Travis'])

    People

    12345

    Python/pandas data mining (fourteen) - groupby, aggregation, group-level operations

    Key = ['one','two','one','two','one']

    People.groupby(key).mean()

    Python/pandas data mining (fourteen) - groupby, aggregation, group-level operations

    People.groupby(key).transform(np.mean)

    Python/pandas data mining (fourteen) - groupby, aggregation, group-level operations

    You can see that there are many values as in Table 2.

    def demean(arr):

    return arr - arr.mean()

    demeaned = people.groupby(key).transform(demean)

    demeaned

    demeaned.groupby(key).mean()

    The most general groupby method is apply.

    Tips = pd.read_csv('C:\\Users\\ecaoyng\\Desktop\\work space\\Python\\py_for_analysis_code\\pydata-book-master\\ch08\ips.csv')

    Tips[:5]

    Python/pandas data mining (fourteen) - groupby, aggregation, group-level operations

    New generation of a column:

    Tips['tip_pct'] = Tips['tip']/Tips['total_bill']

    Tips[:6]

    Python/pandas data mining (fourteen) - groupby, aggregation, group-level operations

    Select the top 5 tip_pct values based on the grouping:

    def top(df,n=5,column='tip_pct'):

    return df.sort_index(by=column)[-n:]

    top(tips,n=6)

    Python/pandas data mining (fourteen) - groupby, aggregation, group-level operations

    Group the smoker and apply the function:

    Tips.groupby('smoker').apply(top)

    Python/pandas data mining (fourteen) - groupby, aggregation, group-level operations

    Multi-parameter version:

    Tips.groupby(['smoker','day']).apply(top,n=1,column='total_bill')

    Python/pandas data mining (fourteen) - groupby, aggregation, group-level operations

    Quantile and bucket analysis:

    Cut and qcut combined with groupby makes it easy to analyze the bucket or quantile of the dataset.

    Frame = pd.DataFrame({'data1':np.random.randn(1000),

    'data2': np.random.randn(1000)})

    Frame[:5]

    Python/pandas data mining (fourteen) - groupby, aggregation, group-level operations

    factor = pd.cut(frame.data1,4)

    factor[:10]

    0 (0.281, 2.00374]

    1 (0.281, 2.00374]

    2 (-3.172, -1.442)

    3 (-1.442, 0.281)

    4 (0.281, 2.00374]

    5 (0.281, 2.00374]

    6 (-1.442, 0.281)

    7 (-1.442, 0.281)

    8 (-1.442, 0.281)

    9 (-1.442, 0.281)

    Name: data1, dtype: category

    Categories (4, object): [(-3.172, -1.442] " (-1.442, 0.281) " (0.281, 2.00374] " (2.00374, 3.727]]

    def get_stats(group):

    return {'min':group.min(), 'max':group.max(), 'count':group.count(), 'mean':group.mean()}

    grouped = frame.data2.groupby(factor)

    grouped.apply(get_stats).unstack()

    Python/pandas data mining (fourteen) - groupby, aggregation, group-level operations

    These are buckets of equal length. To get equal-sized buckets based on the number of samples, use qcut.

    Equal-length buckets: equal intervals

    Equal-sized buckets: equal number of data points

    grouping = pd.qcut(frame.data1,10, labels=False) # label=false can get the quantile number

    grouped = frame.data2.groupby(grouping)

    grouped.apply(get_stats).unstack()

    Python/pandas data mining (fourteen) - groupby, aggregation, group-level operations

    Explosion Proof Quick Plug Connector

    BX3 series of explosion-proof socket connector is applicable to the petroleum, machinery, chemical, mining, drilling, wharf, construction, can provide the power for the electric motor, electric welding machine, pump, compressor, outdoor lighting equipment.

    Product design,compact structure, easy installation, high temperature, shock, anti-aging, has good insulating properties and mechanical properties, products built with trapezoidal hole rubber seal and clamp device can be used for different cable diameters, not afraid of the rain, in oil exploration, drilling, well field standardized electrical lines have been widely applied. Products meet the national standard GB3836-201 "electrical apparatus fo explosive gas atmosphere", after the inspection of the national explosion-proof electrical quality inspection, certificate of proof, explosion-proof marks CNEx19.1354X, EXnA II CT4Gc.


    The product consists of two parts, plugs and sockets, which plug is removable (YT), there are three forms of socket options: 1,fixed (GZ), 2, straight-through (YZ), 3, hanging (GYZ). Connection plug and socket bayonet quick connect, contact with the wire ends to tighten the screws. Shell is made of PA66 insulation corrosion material, silver-plated contacts, access to reliable, easy wiring, sealing and insulation, waterproof, dustproof, shockproof and so on.


    Main Technical Data
    Operating temperature: -20 ℃ 60 ℃
    Rated voltage: 0 - 500V Frequency: 50Hz - 60Hz
    Rated current: 1OA/16A/25A/32A/40A/60A/63A/1OOA/125A/150A/200A/250A/300A/400A/500A/600A/630A
    Withstand voltage: 1800VAC
    Insulation resistance: > 100MO
    Contact resistance :lower than 0.5 MO
    Temperature rise: lower than 65K
    Mechanical life: 500 times
    Contact Poles: 3P/3P+E/3P+N+E
    Housing protection: IP67

    Recommend to use cables with suitabe specifications:
    recomend cable sizes

    Product Selection Parameters:

    Catalog Explosion proof Plug socket-9



    Explosion Proof Plug Socket Connectors

    Ningbo Bond Industrial Electric Co., Ltd. , https://www.bondelectro.com

    Previous Post: Huixian Municipal Urban and Rural Administration Sanitation Union Station Equipment Project Competitive Negotiation Announcement (Secondary Announcement)
    Next Post: None
    Home
    Recent Posts
    • Itunes upgrade error 3194 how to do _itunes reco…
    • Itunes upgrade error 3194 how to do _itunes reco…
    • Huixian Municipal Urban and Rural Administration…
    • Huixian Municipal Urban and Rural Administration…
    • Python/pandas data code analysis