[Solved] pandas column selection using boolean values from another dataframe

Let’s take a simplified example, so I can show the process here. language = pd.Index([‘zh’, ‘zh’, ‘zh’, ‘zh’, ‘zh’, ‘zh’, ‘zh’, ‘zh’, ‘zh’,’zh’,’na’, ‘na’, ‘na’, ‘na’, ‘na’, ‘na’, ‘na’, ‘na’, ‘na’, ‘na’], dtype=”object”, name=”Page”) web = pd.DataFrame(columns = range(len(language))) web.shape (0, 20) language.shape (0, 20) So both have the same number of columns, and you … Read more

[Solved] Using python and pandas I need to paginate the results from a sql query in sets of 24 rows into a plotly table . How do I do this?

Using python and pandas I need to paginate the results from a sql query in sets of 24 rows into a plotly table . How do I do this? solved Using python and pandas I need to paginate the results from a sql query in sets of 24 rows into a plotly table . How … Read more

[Solved] Pandas – Count Vectorize Series of Transaction Activities by User [closed]

The pivot_table function in pandas should do what you want. For instance: import pandas as pd frame = pd.read_csv(‘myfile.csv’, header=None) frame.columns = [‘user_id’, ‘date’, ‘event_type’] frame_pivoted = frame.pivot_table( index=’user_id’, columns=”event_type”, aggfunc=”count” ) In general, using vectorized Pandas functions is much faster than for loops, although I haven’t compared the performance in your specific case. 0 … Read more

[Solved] splitting url and getting values from that URl in columns

try using a str.split and add another str so you can index each row. data = [{‘ID’ : ‘1’, ‘URL’: ‘https://ckd.pdc.com/pdc/73ba5189-94fd-44aa-88d3-6b36aaa69b02/DDA1610095.zip’}] df = pd.DataFrame(data) print(df) ID URL 0 1 https://ckd.pdc.com/pdc/73ba5189-94fd-44aa-88d… #Get the file name and replace zip (probably a more elegant way to do this) df[‘Zest’] = df.URL.str.split(“https://stackoverflow.com/”).str[-1].str.replace(‘.zip’,”) #assign the type into the next column. … Read more

[Solved] How to find and calculate the number of duplicated rows between two different dataframe? [closed]

You can sorted both DataFrames – columns c_x and c_y, for movies is used DataFrame.pivot, count non missing values by DataFrame.count and append to df1: df2[[‘c_x’,’c_y’]] = np.sort(df2[[‘c_x’,’c_y’]], axis=1) df2[‘g’] = df2.groupby([‘c_x’,’c_y’]).cumcount().add(1) df2 = df2.pivot(index=[‘c_x’,’c_y’], columns=”g”, values=”movie”).add_prefix(‘movie’) df2[‘number’] = df2.count(axis=1) print (df2) g movie1 movie2 number c_x c_y bob dan c f 2 uni a … Read more

[Solved] How to iterate a vectorized if/else statement over additional columns?

Option 1 You can nest numpy.where statements: org[‘LT’] = np.where(org[‘ID’].isin(ltlist_set), 1, np.where(org[‘ID2’].isin(ltlist_set), 2, 0)) Option 2 Alternatively, you can use pd.DataFrame.loc sequentially: org[‘LT’] = 0 # default value org.loc[org[‘ID2’].isin(ltlist_set), ‘LT’] = 2 org.loc[org[‘ID’].isin(ltlist_set), ‘LT’] = 1 Option 3 A third option is to use numpy.select: conditions = [org[‘ID’].isin(ltlist_set), org[‘ID2’].isin(ltlist_set)] values = [1, 2] org[‘LT’] = … Read more

[Solved] How to check whether data of a row is in list, inside of np.where()?

You can use .isin() directly in pandas filtering – recent_indicators_filtered = recent_indicators[recent_indicators[‘CountryCode’].isin(developed_countries)] Also, you can come up with a boolean column that says True if developed – recent_indicators[‘Developed’] = recent_indicators[‘CountryCode’].isin(developed_countries) solved How to check whether data of a row is in list, inside of np.where()?

[Solved] Search for column values in another column and assign a value from the next column from the row found to another column

You can try creating a dictionary from columns [‘CheckStringHere’,’AssociatedValue1′] and replace values from StringToCheck column: d = dict(df[[‘CheckStringHere’,’AssociatedValue1′]].to_numpy()) df[‘FromNumber’] = df[‘StringToCheck’].replace(d) #or df[‘FromNumber’] = df[‘StringToCheck’].map(d).fillna(df[‘FromNumber’]) print(df) StringToCheck FromNumber ToNumber CheckStringHere AssociatedValue1 \ 0 10T 56 AAA_ER 1 125T 16 FGGR_DBC 2 10T 56 3 125T 16 AssociatedValue2 0 1 2 58 3 24 2 solved … Read more

[Solved] Flatten list of lists within dictionary values before processing in Pandas

As a follow up to the original post. I managed to resolve the issue, and flattened the lists within the dictionary, with the help of the following generator function: Taken from here: def flatten(l): for el in l: if isinstance(el, collections.Iterable) and not isinstance(el, basestring): for sub in flatten(el): yield sub else: yield el And … Read more

[Solved] Forecasting basis the historical figures

You need to join those two dataframes to perform multiplication of two columns. merged_df = segmentallocation.merge(second,on=[‘year’,’month’],how=’left’,suffixes=[”,’_second’]) for c in interested_columns: merged_df[‘allocation’+str(c)] = merged_df[‘%of allocation’+str(c)] * merged_df[c] merged_df year month segment x y z k %of allocationx %of allocationy %of allocationz %of allocationk x_second y_second z_second k_second allocationx allocationy allocationz allocationk 0 2018 FEB A 2094663 … Read more

[Solved] Matplotlib spacing in xaxis

Is this what you want?, try adding the below lines of code to your code: plt.xticks(rotation=90) plt.gca().margins(x=0) plt.gcf().canvas.draw() tl = plt.gca().get_xticklabels() maxsize = max([t.get_window_extent().width for t in tl]) m = 0.2 # inch margin s = maxsize/plt.gcf().dpi*150+2*m margin = m/plt.gcf().get_size_inches()[0] plt.gcf().subplots_adjust(left=margin, right=1.-margin) plt.gcf().set_size_inches(s, plt.gcf().get_size_inches()[1]) 4 solved Matplotlib spacing in xaxis