Often we need to create data in NumPy arrays and convert it to DataFrame because we need to deal with pandas methods.
In this case, convert theArrays NumPy(ndarrays) apandas dataframemakes our data analysis convenient. In this tutorial, we'll take a closer look at some of the common approaches we can use to convert NumPy array to Panda DataFrame.
We'll also look at some common tricks for handling various NumPy array data structures that have values other than Panda's DataFrame.
table of Contents
Creating NumPy arrays (ndarrays)
NumPy arrays are multidimensional arrays, they can store homogeneous or heterogeneous data.
There are several ways that we can create a NumPy array.
Method 1: Usearrange ()Method: A series of values is created according to the given parameter, starting from zero. Here is a code snippet showing how to use it.
importiere numpy als nparry = np.arange(20)print(arry)
Salida
This is a one dimensional array.
Method 2: UseListand numpy.array(): In this technique, we use the numpy.array() method and pass the list to convert it to an array. Here is a code snippet showing how to use it.
import numpy as npli = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]arry = np.array(li)print(arry)
Salida
But for DataFrame we need a two dimensional array. To create a two-dimensional array, we have two different approaches:
With range() andreform(): We can use these two methods one after the other to generate a set of values and put them in the correct form. Here is a code snippet showing how to use it.
importiere numpy als nparry = np.arange(24).reshape(8,3)print(arry)
Salida
Using list and numpy.array(): In this technique, we use the numpy.array() method and pass a nested list to convert it to an array. Here is a code snippet showing how to use it.
import numpy as npli = [[10, 20, 30, 40], [42, 52, 62, 72]]arry = np.array(li)print(arry)
Salida
Converting a homogeneous NumPy array (ndarrays) using the DataFrame constructor
A DataFrame in Pandas is a two-dimensional collection of data in rows and columns. Stores homogeneous and heterogeneous data.
We need to use the DataFrame() constructor to create a DataFrame from a NumPy array. Here is a code snippet showing how to use it.
import numpy as npimport pandas as pdli = [[10, 20, 30, 40], [42, 52, 62, 72]]arry = np.array(li)dataf = pd.DataFrame(arry)print(dataf)print ()print(type(dataf))
Salida
Add column name and index to converted data frame
We can use the column and index parameters in DataFrame() to determine the column names and index labels for the DataFrame.
By default, the column and index values start at 0 and increment by 1. Here's an example of a DataFrame specifying the columns and index.
import numpy as npimport pandas as pdli = [[10, 20, 30, 40], [42, 52, 62, 72]]arry = np.array(li)dataf = pd.DataFrame(arry, index = ['R1 ', 'R2'], Spalten = ['ColA', 'ColB', 'ColC', 'ColD'])print(datosf)print()print(type(datosf))
Salida
Convert a heterogeneous NumPy array to a DataFrame
We can also create a DataFrame from a NumPy array containing heterogeneous values as a nested list.
We can pass the ndarrays object to the DataFrame() constructor and set the column values to create a DataFrame with a heterogeneous data value.
Here is an example of a DataFrame with heterogeneous data.
import numpy as npimport pandas as pdarry = np.array([[25, 'Karlos', 2015], [21, 'Gaurav', 2016], [22, 'Dee', 2018]], dtype = object)df = pd.DataFrame(arry, column = ['Age', 'Student_Name', 'Passing Year'] , index = [1, 2, 3])print(df)
Salida
Create DataFrame from NumPy array by columns
This is another approach to creating a DataFrame from a NumPy array, using the full two-dimensional column-by-column indexing mechanism of ndarrays.
It works similar to the main column in the general matrix. Here is an example showing how to use it.
import numpy as npimport pandas as pdarry = np.array([[10, 20, 30, 40], [15, 18, 20, 23], [51, 42, 33, 24]])print(arry, "\ n")myDat = pd.DataFrame({'col_1': arry[:, 0], # Create pandas DataFrame 'col_2': arry[:, 1], 'col_3': arry[:, 2], 'col_4' : arry[:, 3]})print(myDat)
Salida
Create DataFrame from NumPy array by rows
Here's another approach to creating a DataFrame from a NumPy array, using the full two-dimensional row-by-row indexing mechanism of ndarrays. It works similar to the parent row in the general matrix. Here is an example showing how to use it.
import numpy as npimport pandas as pdarry = np.array([[10, 20, 30, 40], [15, 18, 20, 23], [51, 42, 33, 24]])print(arry, "\ n")myDat = pd.DataFrame({'row_1': arry[0, :], # Create pandas DataFrame 'row_2': arry[1, :], 'row_3': arry[2, :]}, index = ['Spalte1', 'Spalte2', 'Spalte3', 'Spalte4'])imprimir(myDat)
Salida
Concatenate NumPy array to Panda dataframe
We can also concatenate NumPy arrays with Panda's DataFrame by creating one DataFrame (via ndarray) and merging it with the other using the equality operator. Here is a code snippet showing how to implement it.
import numpy as npimport pandas as pdary = np.array([['India', 91], ['USA', 1], ['France', 33]], dtype = object)print(ary)print(type(; ario), "\n")df = pd.DataFrame(ario, column = ['CountryName', 'PhoneCode'])arr1 = np.array([['Jio'], ['Airtel'], [ 'AT&T']], dtype=object) df2 = pd.DataFrame(arr1, column = ['Brand']) df['Brand_Name'] = df2['Brand']print(df);
Salida
Add NumPy array as new column inside DataFrame
We can also embed a 2D NumPy array directly into a Pandas DataFrame. To do this, we need to convert a nested list to a Panda DataFrame and assign it to the existing DataFrame's column with a column name.
Here is a code snippet showing how to add a new column based on a NumPy array directly with a column name.
import numpy as npimport pandas as pddf = pd.DataFrame(np.arange(4, 13).reshape(3, 3))df['New_Col'] = pd.DataFrame(np.array([[2], [4 ], [6]]))borracho(df)
Salida
matriz NumPy a DataFrame con concat ()
concat() is another powerful Pandas method to concatenate two DataFrames into a new one. We can use the concat() method to concatenate a new DataFrame with a NumPy array.
The syntax is: pandas.concat([dataframe1, pandas.DataFrame(ndarray)], axis = 1) Here is the code snippet showing how to implement it.
import numpy as npimport pandas as pddf = pd.DataFrame({'value1': [25, 12, 15, 14, 19], 'value2': [52, 17, 12, 9, 41], 'value3': [ 10, 30, 15, 11, 14]})nuevaArr = np.matriz([[12, 13], [11, 10], [22, 17], [18, 27], [31, 14]]) new_df = pd.concat([df, pd.DataFrame(newArr)], eje = 1)print(new_df)
Salida
Convert NumPy Array to DataFrame with random.rand() and reshape()
We can generate some random numbers (using random.rand()) and reshape the entire object into a two-dimensional NumPy array format using reshape().
Then we can convert it to a DataFrame. Here is a code snippet showing how to implement it.
import numpy as npimport pandas as pdarry = np.random.rand(8).reshape(2, 4)print("Numpy array:")print(arry)# convertir numpy array a dataframedf = pd.DataFrame(arry, column = ['C1', 'C2', 'C3', 'C4'])imprimir("\n Pandas DataFrame: ")imprimir(df)
Salida
Adding the NumPy Array to the Panda DataFrame with tolist()
We can also use NumPy's tolist() method to get an entire NumPy array and place it as part of the DataFrame column.
The syntax looks like this: dataframe_object['column_name'] = ndarray_object.tolist() Here is a code snippet showing how to use it.
import numpy as npimport pandas as pddf = pd.DataFrame({'value1': [25, 12, 15, 14, 19], 'value2': [52, 17, 12, 9, 41], 'value3': [ 10, 30, 15, 11, 14]})nuevo = np.array([3, 7, 1, 0, 5])df['Newcol'] = nuevo.tolist()print(df)
Salida
Creating DataFrames by np.zeros()
We can also create a DataFrame by implementing numpy.zeros(). Those ndarrays are all null values and are also used to create the DataFrame.
Here is a code snippet showing how to implement it.
import numpy as npimport pandas as pdarry = pd.DataFrame(np.zeros((5, 3)))print("Numpy array:")print(arry)df = pd.DataFrame(arry, column = ['C1', 'C2', 'C3'])df = df.fillna(0)imprimir("\n Pandas DataFrame: ")imprimir(df)
Salida
Creating DataFrames with random.choice() from NumPy array
Another way to create a NumPy array from a DataFrame is to use random.choice() and place it in the DataFrame() constructor to directly convert the NumPy array of a given size to a DataFrame. Here is a script showing how to implement it.
import numpy as npimport pandas as pddf = df = pd.DataFrame(np.random.choice(12, (3, 4)), column = list('ABCD'))print("\n Pandas DataFrame: ")print( f)
Salida
Transpose a NumPy array before creating a DataFrame
We can create a transpose of a NumPy array and place it in a DataFrame. Here's a code example showing how to implement it.
importar numpy como npimport pandas como pdarry = np.array([[4, 8], [15, 18], [18, 21], [13, 19], [10, 15], [7, 12], [ 4, 2], [5, 1], [8, 4], [9, 24], [23, 35], [10, 22], [12, 27]]) ary_tp = ary.transpose()print (arry_tp)imprimir()df = pd.DataFrame({'col1': arry_tp[0], 'col2': arry_tp[1]})imprimir(df.tail())
Salida
Create an empty DataFrame from an empty NumPy array
We can create an empty data frame from a NumPy array that stores NaN values (not a number). Here is a code snippet showing how to implement it.
importar pandas como pdimport numpy como npdf = pd.DataFrame(np.nan, index = [0, 1, 2], column = ['A', 'B', 'C', 'D'])df = df. fillna(' ')imprimir(df)
Salida
Generating DataFrame through iterations of NumPy arrays
We can do an implicit iteration as a list comprehension inside the DataFrame() constructor, which can use the NumPy array to iterate over the ndarray elements based on the shape().
Ultimately, you can provide us with a DataFrame of the ndarray. Here is a script that shows how to run it.
importar pandas como pdimport numpy como nparry = np.array([[2, 4, 6], [10, 20, 30]])df = pd.DataFrame(data = arry[0:, 0:], index = [ 'Row-' + str(g + 1) for g in range(arry.shape[0])], column=['Column-' + str(g + 1) for g in range(arry.shape[1] ) ])borracho(df)
Salida
NumPy array display and Pandas DataFrame
We can also visualize NumPy array or DataFrame using data visualization libraries likematplotlib. Here is a code snippet showing how to implement it.
import numpy as npimport pandas as pdimport matplotlib.pyplot as pltarry = np.array(np.random.random((30, 3)))print(arry)# Scatterplot for NumPy Arrayplt.scatter(arry[:, 0], arry [:, 1], c = matriz[:, 2], cmap = 'Púrpura')df = pd.DataFrame(arry, columna = ['C1', 'C2', 'C3'])print(df) # Diagrama de dispersión para DataFrame con dos valores de columna plt.scatter(df["C1"], df["C2"])plt.show()
Salida
Salida
Diploma
Converting data from a NumPy array to a DataFrame has become an important task and the daily work of a data scientist.
Often we need to work with DataFrame instead of NumPy arrays. Data conversion plays an essential role in order to use the Pandas library to its full extent.
We hope this tutorial has given you a clear idea of how to create and convert NumPy arrays (ndarrays) to DataFrame.
This tutorial has highlighted how to convert a homogeneous ndarray using the DataFrame constructor, add indices to a converted DataFrame, convert a heterogeneous NumPy array to a DataFrame, and create a DataFrame from a NumPy array by rows and columns.
We also found techniques like concatenating NumPy arrays with Panda's DataFrame and adding NumPy arrays as a new column inside a DataFrame and using the concat() methods.
Finally, we come across creating an empty DataFrame from an empty Ndarray and displaying the NumPy array and the pandas DataFrame.
Other reading
For more information, see the official docs:
https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html
Gaurav Krroy
Gaurav is a Full-Stack (sr.) Technology Content Engineer (6.5 years experience) and has a deep passion for curating articles, blogs, eBooks, tutorials, infographics and other web content. Other than that, he is engaged in security research and has found bugs for many governments. and private companies around the world. He is the author of two books and has contributed to more than 500 articles and blogs. He is a computer science educator and loves spending time on lean programming, data science, privacy, and SEO. In addition to writing, he loves to play table football, read novels, and dance.