In most of the NLP Problems, we need a large dataset to train the model. But there are relatively fewer datasets for NLP Problems available on the internet. So Faker library comes as an aide to create our Datasets.
Topics covered in this article:
- Faker Library and its functionality
- Create our Dataset
1.Install Faker with pip
pip install Faker
2. Importing the faker library
from faker import Faker
#Initilaze the Faker function with a variable
fake = Faker()
3. Exploring the functionality of the Faker Library
Let's create a Name, Phone number, Email-Id, Address dataset using Faker
print("Phone Number: ",fake.phone_number())
print("Address:" , fake.address())
We can generate random text in many languages (for eg: Spanish, Hindi, Japanese, French, and many more languages). You can refer to this documentation to know the languages supported in Faker Library — https://faker.readthedocs.io/en/stable/locales.html.
faker_hi = Faker(['hi_IN'])
for i in range(2):
#Creating random text using Faker library.
faker_es = Faker(['es_ES'])
Creating Custom Dataset using Profile
Let’s create a Fake profile using the Faker library and create a pandas Dataframe using the Faker library.
Creating a 100 fake profiles dataset using the library.
import pandas as pd #Importing pandas
from faker import Faker #Importing the Faker
fake=Faker() #Initializing the Faker
profile = [fake.profile() for i in range(100)] #Creating fake profile
df = pd.DataFrame(profile)df.head() #Converting into a dataframe
For more functionality refer to the Faker documentation — https://faker.readthedocs.io/en/stable/
In this article, we have seen the functionality of the Faker library and created a fake profile dataset. We can utilize this Faker Library to generate a dataset in all languages for ML and NLP projects.