Pandas Cheat Sheet pro Data Science in Python

Co je to Pandas Cheat Sheet?
Knihovna Pandas mรก mnoho funkcรญ, ale nฤkterรฉ z nich jsou pro nฤkterรฉ lidi matoucรญ. Zde jsme poskytli uลพiteฤnรฝ dostupnรฝ zdroj s nรกzvem Python Cheat Sheet pro pandy. Vysvฤtluje zรกklady Pandy jednoduchรฝm a struฤnรฝm zpลฏsobem.
Aลฅ uลพ jste zaฤรกteฤnรญk nebo mรกte zkuลกenosti s Pandas, tento cheat list mลฏลพe slouลพit jako uลพiteฤnรก referenฤnรญ pลรญruฤka. Pokrรฝvรก ลadu tรฉmat, vฤetnฤ prรกce s datovรฝmi strukturami Series a DataFrame, vรฝbฤru a ลazenรญ dat a aplikace funkcรญ na vaลกe data.
Struฤnฤ ลeฤeno, tyto Pandy Python Cheat Sheet je dobrรฝm zdrojem pro kaลพdรฉho, kdo se chce dozvฤdฤt vรญce o pouลพรญvรกnรญ Python pro Data Science. Je to ลกikovnรฝ referenฤnรญ nรกstroj. Mลฏลพe vรกm to pomoci zlepลกit vaลกe dovednosti analรฝzy dat a pracovat efektivnฤji s Pandas.
๐ Stรกhnฤte si PDF Cheat Sheet zde
Vysvฤtlenรญ dลฏleลพitรฝch funkcรญ v Pandas:
Chcete-li zaฤรญt pracovat s funkcemi pandy, musรญte pandy nainstalovat a importovat. K tomu slouลพรญ dva pลรญkazy:
Krok 1) # Nainstalujte Pandy
Pip nainstalujte pandy
Krok 2) # Importujte pandy
Importujte pandy jako pd
Nynรญ mลฏลพete zaฤรญt pracovat s funkcemi Pandas. Budeme pracovat na manipulaci, analรฝze a ฤiลกtฤnรญ dat. Zde jsou nฤkterรฉ dลฏleลพitรฉ funkce pand.
Datovรฉ struktury Pandas
Jak jsme jiลพ diskutovali, Pandas mรก dvฤ datovรฉ struktury nazรฝvanรฉ Series a DataFrames. Obฤ jsou oznaฤenรก pole a mohou obsahovat libovolnรฝ datovรฝ typ. Jedinรฝ rozdรญl je v tom, ลพe Series je jednorozmฤrnรฉ pole a DataFrame je dvourozmฤrnรฉ pole.
1. ลada
Je to jednorozmฤrnรฉ oznaฤenรฉ pole. Mลฏลพe obsahovat jakรฝkoli typ dat.
s = pd.Series([2, -4, 6, 3, None], index=['A', 'B', 'C', 'D', 'E'])
2. DataFrame
Jednรก se o dvourozmฤrnรฉ oznaฤenรฉ pole. Mลฏลพe obsahovat jakรฝkoli datovรฝ typ a rลฏznรฉ velikosti sloupcลฏ.
data = {'RollNo' : [101, 102, 75, 99],
'Name' : ['Mithlesh', 'Ram', 'Rudra', 'Mithlesh'],
'Course' : ['Nodejs', None, 'Nodejs', 'JavaScript']
}
df = pd.DataFrame(data, columns=['RollNo', 'Name', 'Course'])
df.head()
Import dat
Pandy majรญ schopnost importovat nebo ฤรญst rลฏznรฉ typy souborลฏ ve vaลกem notebooku.
Zde je nฤkolik pลรญkladลฏ uvedenรฝch nรญลพe.
# Import a CSV file pd pd.read_csv(filename) # Import a TSV file pd.read_table(filename) # Import a Excel file pd pd.read_excel(filename) # Import a SQL table/database pd.read_sql(query, connection_object) # Import a JSON file pd.read_json(json_string) # Import a HTML file pd.read_html(url) # From clipboard to read_table() pd.read_clipboard() # From dict pd.DataFrame(dict)
Vรฝbฤr
Prvky mลฏลพete vybrat podle jejich umรญstฤnรญ nebo indexu. Pomocรญ tฤchto technik mลฏลพete vybrat ลรกdky, sloupce a odliลกnรฉ hodnoty.
1. ลada
# Accessing one element from Series s['D'] # Accessing all elements between two given indices s['A':'C'] # Accessing all elements from starting till given index s[:'C'] # Accessing all elements from given index till end s['B':]
2. DataFrame
# Accessing one column df df['Name'] # Accessing rows from after given row df[1:] # Accessing till before given row df[:1] # Accessing rows between two given rows df[1:2]
Vรฝbฤr podle logickรฉho indexovรกnรญ a nastavenรญ
1. Podle pozice
df.iloc[0, 1] df.iat[0, 1]
2. Podle ลกtรญtku
df.loc[[0], ['Name']]
3. Podle ลกtรญtku/pozice
df.loc[2] # Both are same df.iloc[2]
4. Booleovskรฉ indexovรกnรญ
# Series s where value is > 1 s[(s > 0)] # Series s where value is <-2 or >1 s[(s < -2) | ~(s > 1)] # Use filter to adjust DataFrame df[df['RollNo']>100] # Set index a of Series s to 6 s['D'] = 10 s.head()
ฤiลกtฤnรญ dat
Pro Python pro รบฤely cheatลฏ pro ฤiลกtฤnรญ dat mลฏลพete provรกdฤt nรกsledujรญcรญ operace:
- Pลejmenujte sloupce pomocรญ metody rename().
- Aktualizujte hodnoty pomocรญ metody at[] nebo iat[] pro pลรญstup ke konkrรฉtnรญm prvkลฏm a jejich รบpravu.
- Vytvoลte kopii sรฉrie nebo datovรฉho rรกmce pomocรญ metody copy().
- Zkontrolujte hodnoty NULL pomocรญ metody isnull() a zruลกte je pomocรญ metody dropna().
- Zkontrolujte duplicitnรญ hodnoty pomocรญ metody duplicated(). Zruลกte je pomocรญ metody drop_duplicates().
- Nahraฤte hodnoty NULL pomocรญ metody fill () zadanou hodnotou.
- Nahraฤte hodnoty pomocรญ metody replace() .
- Seลaฤte hodnoty pomocรญ metody sort_values().
- Seลaฤte hodnoty pomocรญ metody rank().
# Renaming columns
df.columns = ['a','b','c']
df.head()
# Mass renaming of columns
df = df.rename(columns={'RollNo': 'ID', 'Name': 'Student_Name'})
# Or use this edit in same DataFrame instead of in copy
df.rename(columns={'RollNo': 'ID', 'Name': 'Student_Name'}, inplace=True)
df.head()
# Counting duplicates in a column
df.duplicated(subset='Name')
# Removing entire row that has duplicate in given column
df.drop_duplicates(subset=['Name'])
# You can choose which one keep - by default is first
df.drop_duplicates(subset=['Name'], keep='last')
# Checks for Null Values
s.isnull()
# Checks for non-Null Values - reverse of isnull()
s.notnull()
# Checks for Null Values df
df.isnull()
# Checks for non-Null Values - reverse of isnull()
df.notnull()
# Drops all rows that contain null values
df.dropna()
# Drops all columns that contain null values
df.dropna(axis=1)
# Replaces all null values with 'Guru99'
df.fillna('Guru99')
# Replaces all null values with the mean
s.fillna(s.mean())
# Converts the datatype of the Series to float
s.astype(float)
# Replaces all values equal to 6 with 'Six'
s.replace(6,'Six')
# Replaces all 2 with 'Two' and 6 with 'Six'
s.replace([2,6],['Two','Six'])
# Drop from rows (axis=0)
s.drop(['B', 'D'])
# Drop from columns(axis=1)
df.drop('Name', axis=1)
# Sort by labels with axis
df.sort_index()
# Sort by values with axis
df.sort_values(by='RollNo')
# Ranking entries
df.rank()
# s1 is pointing to same Series as s
s1 = s
# s_copy of s, but not pointing same Series
s_copy = s.copy()
# df1 is pointing to same DataFrame as df
df1 = s
# df_copy of df, but not pointing same DataFrame
df_copy = df.copy()
Naฤรญtรกnรญ informacรญ
Chcete-li zรญskat informace, mลฏลพete provรฉst tyto operace:
- Pomocรญ atributu tvar zรญskรกte poฤet ลรกdkลฏ a sloupcลฏ.
- Pomocรญ metody head() nebo tail() zรญskรกte prvnรญch nebo poslednรญch nฤkolik ลรกdkลฏ jako vzorek.
- K zรญskรกnรญ informacรญ o datovรฉm typu, poฤtu, prลฏmฤru, smฤrodatnรฉ odchylce, minimรกlnรญch a maximรกlnรญch hodnotรกch pouลพijte metodu info(), description() nebo dtypes.
- Pomocรญ metod count(), min(), max(), sum(), mean() a mediรกn() zรญskรกte specifickรฉ statistickรฉ informace o hodnotรกch.
- K zรญskรกnรญ ลรกdku pouลพijte metodu loc[].
- Pomocรญ metody groupby() pouลพijte funkci GROUP BY k seskupenรญ podobnรฝch hodnot ve sloupci DataFrame.
1. Zรกkladnรญ informace
# Counting all elements in Series len(s) # Counting all elements in DataFrame len(df) # Prints number of rows and columns in dataframe df.shape # Prints first 10 rows by default, if no value set df.head(10) # Prints last 10 rows by default, if no value set df.tail(10) # For counting non-Null values column-wise df.count() # For range of index df df.index # For name of attributes/columns df.columns # Index, Data Type and Memory information df.info() # Datatypes of each column df.dtypes # Summary statistics for numerical columns df.describe()
2. Shrnutรญ
# For adding all values column-wise
df.sum()
# For min column-wise
df.min()
# For max column-wise
df.max()
# For mean value in number column
df.mean()
# For median value in number column
df.median()
# Count non-Null values
s.count()
# Count non-Null values
df.count()
# Return Series of given column
df['Name'].tolist()
# Name of columns
df.columns.tolist()
# Creating subset
df[['Name', 'Course']]
# Return number of values in each group
df.groupby('Name').count()
Pouลพitรญ funkcรญ
# Define function f = lambda x: x*5 # Apply this function on given Series - For each value s.apply(f) # Apply this function on given DataFrame - For each value df.apply(f)
1. Vnitลnรญ zarovnรกnรญ dat
# NA values for indices that don't overlap s2 = pd.Series([8, -1, 4], index=['A', 'C', 'D']) s + s2
2. Aritmetika Operas metodami vรฝplnฤ
# Fill values that don't overlap s.add(s2, fill_value=0)
3. Filtr, ลazenรญ a seskupovรกnรญ
Tyto nรกsledujรญcรญ funkce lze pouลพรญt pro filtrovรกnรญ, ลazenรญ a seskupovรกnรญ podle Series a DataFrame.
# Filter rows where column is greater than 100
df[df['RollNo']>100]
# Filter rows where 70 < column < 101
df[(df['RollNo'] > 70) & (df['RollNo'] < 101)]
# Sorts values in ascending order
s.sort_values()
# Sorts values in descending order
s.sort_values(ascending=False)
# Sorts values by RollNo in ascending order
df.sort_values('RollNo')
# Sorts values by RollNo in descending order
df.sort_values('RollNo', ascending=False)
Export dat
Pandas mรก moลพnost exportovat nebo zapisovat data v rลฏznรฝch formรกtech. Nรญลพe uvรกdรญme nฤkolik pลรญkladลฏ.
# Export as a CSV file df df.to_csv(filename) # Export as a Excel file df df.to_excel(filename) # Export as a SQL table df df.to_sql(table_name, connection_object) # Export as a JSON file df.to_json(filename) # Export as a HTML table df.to_html(filename) # Write to the clipboard df.to_clipboard()
Pandas Cheat Sheet Zรกvฤr:
Pandy je knihovna s otevลenรฝm zdrojovรฝm kรณdem Python pro prรกci s datovรฝmi sadami. Jeho schopnost analyzovat, ฤistit, zkoumat a manipulovat s daty. Pandas je postavena na vrcholu Numpy. Pouลพรญvรก se s jinรฝmi programy, jako je Matplotlib a scikit-uฤit se. Pokrรฝvรก tรฉmata, jako jsou datovรฉ struktury, vรฝbฤr dat, import dat, logickรฉ indexovรกnรญ, vypouลกtฤnรญ hodnot, ลazenรญ a ฤiลกtฤnรญ dat. K ฤlรกnku jsme takรฉ pลipravili cheat sheet pdf pro pandy. Pandy jsou knihovnou Python a datovรก vฤda pouลพรญvรก tuto knihovnu pro prรกci s datovรฝmi snรญmky a sรฉriemi pandas. V tomto cheatsheetu jsme probrali rลฏznรฉ pลรญkazy pandy.
Colab of Cheat Sheet
Mลฏj soubor cviฤenรญ Colab pro Pandy โ Cheat Sheet pro pandy โ Python pro Data Science.ipynb

