dataframe Archives

[Solved] How to generate a “triangular” data frame with as many columns as the row indicates?

March 4, 2023 by Kirat

Try this: (df.join(pd.DataFrame( df[‘number’] .map(lambda x: range(1,x+1)).tolist()) .rename(lambda x: ‘C{}’.format(x+1),axis=1))) Output: number C1 C2 C3 C4 C5 C6 0 1 1 NaN NaN NaN NaN NaN 1 2 1 2.0 NaN NaN NaN NaN 2 3 1 2.0 3.0 NaN NaN NaN 3 4 1 2.0 3.0 4.0 NaN NaN 4 6 1 2.0 3.0 … Read more

[Solved] R: create new dataframe rows are columns from another dataframe

February 23, 2023 by Kirat

Actually figured it out – was very simple. NewDataFrame <- data.frame(colnames(Original)) solved R: create new dataframe rows are columns from another dataframe

[Solved] How can I find the overlapping values of these ranges in R? [duplicate]

February 10, 2023 by Kirat

You could try using the GenomicRanges package. library(dplyr) library(GenomicRanges) Here we load in the the example input data. (This is an inelegant way to do this — I know… but I was lazy and the sublime multiline edit made it easy.) Note: I don’t know where the “1” column means, but I kept it in … Read more

[Solved] how to create a dataframe in R from ” Min. 1stQu Median Mean 3rdQu Max. NA’s”? [closed]

February 5, 2023 by Kirat

One approach is to use the tidy function from the broom package. It is versatile and can organize most R statistical results into a data frame. library(broom) set.seed(1) x <- rnorm(1000) x[sample(1:length(x), 100)] <- NA df <- tidy(summary(x)) df minimum q1 median mean q3 maximum NA’s 1 -3.008 -0.6834 -0.01371 0.00106 0.6978 3.81 100 As … Read more

[Solved] How to change column names from numbers to names [closed]

January 26, 2023 by Kirat

First, move the non-date columns in to the index, then use replace and reset_index: df = df.set_index(‘CashFlows’) df = df.rename(columns=lambda x: ‘year_’+str(x.year)) solved How to change column names from numbers to names [closed]

[Solved] multiply multiple column and find sum of each column for multiple values

January 22, 2023 by Kirat

Try this: df <- read.table(text = “v1 v2 v3 v4 v5 0 1 1 1 1 0 1 1 0 1 1 0 1 1 0”, skip = 1) df ll <- vector(mode = “list”, length = ncol(df)-1) ll <- lapply(2:ncol(df), function(ncols){ tmp <- t(apply(df, 1, function(rows) combn(x = rows, m = ncols, prod))) if(ncols … Read more

[Solved] how to save sql query result to csv in pandas

January 19, 2023 by Kirat

You can try following code: import pandas as pd df1 = pd.read_csv(“Insert file path”) df2 = pd.read_csv(“Insert file path”) df1[‘Date’] = pd.to_datetime(df1[‘Date’] ,errors=”coerce”,format=”%Y-%m-%d”) df2[‘Date’] = pd.to_datetime(df2[‘Date’] ,errors=”coerce”,format=”%Y-%m-%d”) df = df1.merge(df2,how=’inner’, on =’Date’) df.to_csv(‘data.csv’,index=False) This should solve your problem. 4 solved how to save sql query result to csv in pandas

[Solved] I want to plot the count of over a specific number e,g 2000 [closed]

January 17, 2023 by Kirat

You can assign the result of the value_counts() to a Series and filter it as below: count = df_1[‘neighbourhood’].value_counts() ax = count[count > 2000].plot(kind=’bar’, figsize=(14,8), title=”Neighbourhood that showed”) ax.set_xlabel(“neighboorhood”) ax.set_ylabel(“Frequency”) solved I want to plot the count of over a specific number e,g 2000 [closed]

[Solved] how to calculation cost time [closed]

January 14, 2023 by Kirat

I think I understand what you’re asking. You just want to have a new dataframe that calculates the time difference between the three different entries for each unique order id? So, I start by creating the dataframe: data = [ [11238,3943,201805030759165986,’新建订单’,20180503075916,’2018/5/3 07:59:16′,’2018/5/3 07:59:16′], [11239,3943,201805030759165986,’新建订单’,20180503082115,’2018/5/3 08:21:15′,’2018/5/3 08:21:15′], [11240,3943,201805030759165986,’新建订单’,20180503083204,’2018/5/3 08:32:04′,’2018/5/3 08:32:04′], [11241,3941,201805030856445991,’新建订单’,20180503085644,’2018/5/3 08:56:02′,’2018/5/3 08:56:44′], [11242,3941,201805022232081084,’初审成功’,20180503085802,’2018/5/3 08:58:02′,’2018/5/3 08:58:02′], … Read more

[Solved] Is there any pandas function to merge 3 rows?

January 13, 2023 by Kirat

Let’s take the following sample DataFrame, containing 2 groups of 3 adjacent rows: C1 C2 C3 C4 C5 C6 ABC NaN NaN NaN NaN PK KJ PQR NaN NaN RR SS NaN NaN MNO PO UI NaN NaN NaN NaN XXX AA NaN NaN NaN EE NaN XX1 NaN BB NaN DD NaN FF1 XX2 … Read more

[Solved] Changing a column of a dataframe in R

January 11, 2023 by Kirat

You could use gsub here x<-c(“s1-112”, “s10-112”, “s3656-112”) gsub(“s(.*)-112”, “\\1”, x) # [1] “1” “10” “3656” 1 solved Changing a column of a dataframe in R

[Solved] Extracting Data from a list- find the highest value

January 7, 2023 by Kirat

I’m assuming IATA is the ticket agent variable: df = data.frame(IATA=c(3300, 3300, 3300, 3300, 3301, 3301, 3302, 3303, 3303)) table(df$IATA) # 3300 3301 3302 3303 # 4 2 1 2 As you can see, table gives the frequency of ticket sales by each ticket agent. names(which.max(table(df$IATA))) # [1] “3300” If there are ties and you … Read more

[Solved] How to store missing date(15 min interval) points from csv into new file (15 minutes interval) -python 3.5

January 6, 2023 by Kirat

try this: In [16]: df.ix[df.groupby(df[‘datetime’].dt.date)[‘production’].transform(‘nunique’) < 44 * 4 * 24, ‘datetime’].dt.date.unique() Out[16]: array([datetime.date(2015, 12, 7)], dtype=object) this will give you all rows for the “problematic” days: df[df.groupby(df[‘datetime’].dt.date)[‘production’].transform(‘nunique’) < 44 * 4 * 24] PS there is a good reason why people asked you for a good reproducible sample data sets – with the one … Read more

[Solved] Add a column for counting unique tuples in the data frame [duplicate]

January 4, 2023 by Kirat

1) aggregate ag <- aggregate(count ~ ., cbind(count = 1, df), length) ag[do.call(“order”, ag), ] # sort the rows giving: userID A B count 3 1 2 2 1 4 1 3 3 1 2 3 2 1 2 1 5 1 0 2 The last line of code which sorts the rows could be … Read more

[Solved] Sort the order of dataframe columns based on the values in the bottom row

December 31, 2022 by Kirat

You were close. Try this: import pandas as pd df = pd.DataFrame({‘a’: [1, 2, 3], ‘b’: [ 4, 5, 2], ‘c’: [2, 4, 5]}) print(df) df = df[[x for _, x in sorted(zip(df.iloc[-1], df.columns), reverse=True)]] print(df) Starting DataFrame: a b c 0 1 4 2 1 2 5 4 2 3 2 5 Columns sorted … Read more