dataframe Archives - Page 3 of 6

[Solved] Convert first 2 letters of all records to Uppercase in python

November 11, 2022 by Kirat

You may use map and Convert you data as you required: try below: import pandas as pd df = pd.DataFrame({‘name’:[‘geeks’, ‘gor’, ‘geeks’, ‘is’,’portal’, ‘for’,’geeks’]}) df[‘name’]=df[‘name’].map(lambda x: x[:2].upper()+x[2:]) print (df) output: name 0 GEeks 1 GOr 2 GEeks 3 IS 4 POrtal 5 FOr 6 GEeks demo 1 solved Convert first 2 letters of all records … Read more

[Solved] How to iterate a vectorized if/else statement over additional columns?

November 11, 2022 by Kirat

Option 1 You can nest numpy.where statements: org[‘LT’] = np.where(org[‘ID’].isin(ltlist_set), 1, np.where(org[‘ID2’].isin(ltlist_set), 2, 0)) Option 2 Alternatively, you can use pd.DataFrame.loc sequentially: org[‘LT’] = 0 # default value org.loc[org[‘ID2’].isin(ltlist_set), ‘LT’] = 2 org.loc[org[‘ID’].isin(ltlist_set), ‘LT’] = 1 Option 3 A third option is to use numpy.select: conditions = [org[‘ID’].isin(ltlist_set), org[‘ID2’].isin(ltlist_set)] values = [1, 2] org[‘LT’] = … Read more

[Solved] How to check whether data of a row is in list, inside of np.where()?

November 11, 2022 by Kirat

You can use .isin() directly in pandas filtering – recent_indicators_filtered = recent_indicators[recent_indicators[‘CountryCode’].isin(developed_countries)] Also, you can come up with a boolean column that says True if developed – recent_indicators[‘Developed’] = recent_indicators[‘CountryCode’].isin(developed_countries) solved How to check whether data of a row is in list, inside of np.where()?

[Solved] How do I delete rows from a data frame when the DF only has one column [duplicate]

November 9, 2022 by Kirat

The problem is that R is trying to be “helpful”, and simplifying your data for you. The solution is to do the following (note two commas, not one): df[-1,, drop = FALSE] This will remove the specified row, and leave your data.frame otherwise untouched. solved How do I delete rows from a data frame when … Read more

[Solved] Left align the first column and center align the other columns in a Pandas table

October 26, 2022 by Kirat

The table can be pretty formatted in Pandas by assembling the two missing formatting conditions into a single df. I made the following two changes to the original code. Hide index numbers with hide_index() df[[“Unit”, “Abbreviation”, “Storage”]].style.hide_index() To apply to a subset of columns, you can use the subset parameter. Left align the first column … Read more

[Solved] pd.DataFrame(np.random.randn(8, 4), index=dates, columns=[‘A’, ‘B’, ‘C’, ‘D’])

October 24, 2022 by Kirat

Basically np.random.randn returns random float values of normal distributions with mean = 0 and variance = 1. Now np.random.randn takes shape you would like to return of those distributions. For example: np.random.randn(1,2) returns an array of one row and two columns. Similarly, you can give np.random.randn(1,.,.,.,9) which gives you out a complicated array. Since you … Read more

[Solved] Subsetting the data frame and applying cumulative operation on multiple columns

October 23, 2022 by Kirat

Hopefully I got it this time: subdf = df.iloc[3:, 1:4] df[‘flag’] = 1 if subdf.values.sum()/subdf.size >= 0.1 else 0 output: unit A B C row_num flag 0 ABC 1 1 1 7 1 1 DEF 1 1 1 6 1 2 GEH 1 1 1 5 1 3 IJK 0 1 0 4 1 4 … Read more

[Solved] How to split a string without given delimeter in Panda

October 23, 2022 by Kirat

Assuming your split criteria is by fixed number of characters (e.g. 5 here), you can use: df[‘dfnewcolumn1’] = df[‘dfcolumn’].str[:5] df[‘dfnewcolumn2’] = df[‘dfcolumn’].str[5:] Result: dfcolumn dfnewcolumn1 dfnewcolumn2 0 PUEF2CarmenXFc034DpEd PUEF2 CarmenXFc034DpEd 1 PUEF2BalulanFc034CamH PUEF2 BalulanFc034CamH 2 CARF1BalulanFc013Baca CARF1 BalulanFc013Baca If your split criteria is by the first digit in the string, you can use: df[[‘dfnewcolumn1’, ‘dfnewcolumnX’]] … Read more

[Solved] Pandas: filter data frame by category

October 22, 2022 by Kirat

You can use pandas groupby method with list comprehension which will do the JOb like Below: >>> df X Y 0 Yes 1 1 No 2 2 Yes 3 3 Yes 4 4 No 2 5 No 1 6 Yes 0 7 No 4 8 No 4 9 No 5 >>> {k: v[“Y”].tolist() for k,v … Read more

[Solved] in R, How to sum by flowing row in a data frame

October 15, 2022 by Kirat

We could use shift from data.table library(data.table) m1 <- na.omit(do.call(cbind, shift(df1$col1, 0:4, type=”lead”))) rowSums(m1*(1:5)[col(m1)]/5) #[1] 13.60 12.20 31.24 25.58 30.48 32.58 44.88 Or another option m1 <- embed(df1$col1,5) rowSums(m1*(5:1)[col(m1)]/5) #[1] 13.60 12.20 31.24 25.58 30.48 32.58 44.88 solved in R, How to sum by flowing row in a data frame

[Solved] How to perform self join with same row of previous group(month) to bring in additional columns with different expressions in Pyspark

October 14, 2022 by Kirat

How to perform self join with same row of previous group(month) to bring in additional columns with different expressions in Pyspark solved How to perform self join with same row of previous group(month) to bring in additional columns with different expressions in Pyspark

[Solved] Compare two dataframes and update the dataframe if the data is different [closed]

October 13, 2022 by Kirat

If I understand the logic correctly . . . # imports import pandas as pd from io import StringIO # sample data s1 = “””id Name score 111 Jack 2.17 112 Nick 1.11 113 Zoe 4.12″”” s2 = “””id Name score 111 Jack 2.17 112 Sick 1.10 113 Zoe 4.12 114 Jay 12.3″”” df1 = … Read more

[Solved] R: Is there a function to clean factor levels? characters columnwise in a data frame? [closed]

October 11, 2022 by Kirat

Just use the internal bits from janitor::clean_names(): # #’ ‘Clean’ a character/factor vector like `janitor::clean_names()` does for data frame columns # #’ # #’ Most of the internals are from `janitor::clean_names()` # #’ # #’ @param x a vector of strings or factors # #’ @param refactor if `x` is a factor, return a ref-factored … Read more

[Solved] Automatically subset data frame by factor

October 11, 2022 by Kirat

maybe not the best way to do it, but will get the job done. vars_df = unique(df$x) for (i in 1:length(vars_df)) { assign(paste0(vars_df[i]), df %>% filter(x == vars_df[i]), envir = .GlobalEnv) } solved Automatically subset data frame by factor