Hello everyone! Today I want to write about the Pandas library and here are the 30 things you can do with Pandas to better understand the data!
First thing first, lets import pandas library:
import pandas as pd
df=pd.read_csv('test.csv') # read a test file to dataframe
(1) Read in a CSV dataset
pd.DataFrame.from_csv(“csv_file”)
or
pd.read_csv(“csv_file”)
(2) Read in an Excel dataset
pd.read_excel("excel_file")
(3) Write your data frame directly to csv
df.to_csv("data.csv", sep=",", index=False)
(4) Create a dataframe from data with column names
# Sum of values in a data frame
df.sum()
# Lowest value of a data frame
df.min()
# Highest value
df.max()
# Index of the lowest value
df.idxmin()
# Index of the highest value
df.idxmax()
# Statistical summary of the data frame, with quartiles, median, etc.
df.describe()
# Average values
df.mean()
# Median values
df.median()
# Correlation between columns
df.corr()
# To get these values for only one column, just select it like this#
df["size"].median()
(24) Sorting your data
df.sort_values(ascending = False)
(25) Boolean indexing
df[df["size"] == 5]
(26) Selecting values
df.loc([0], ['size'])
(27 Cross frequency tables between two variables
pd.crosstab(df["y"],df["z"])
(28) Plot function for numeric columns
df["size"].plot()
(29) Get shape (row,columns) of the DataFrame
df.shape
(30) Get Randomly selected n rows from DataFrame
df.sample(n)
There are many more useful things in pandas. We’ll see more about them in upcoming posts.
"Happy Reading, Happy Learning"
Comments