Create fake data with Faker in Python

Fake data is useful in a number of scenarios such as in testing applications or libraries or in demonstrating functionality without any privacy or security concerns. While it is possible to create fake data manually, this can be time consuming and there is a better way. Faker is a library that makes it easy to generate fake data, such as names, phone numbers, emails, and other types of data.

There are many uses of fake data as listed below. In this article, I'll detail the basic use of Faker to create fake data, as well its use to populate a Pandas DataFrame and generate a chart.

  • Unit testing
  • Populating databases
  • Creating sample Reports
  • Simulating financial transactions
  • Load testing for APIs
  • Prototyping and design
  • Testing search functionality
  • Educational environments
  • Anonymizing data


Faker installation and use

Faker can be installed with pip install

1pip install faker

Simply import faker and initialize a faker generator, which can be used to create fake data by calling methods on the fake instance. The following generates a fake name by calling the name method on the instance of faker.

1from faker import Faker
2
3fake = Faker()
4
5# fake name
6fake.name()
7
8# 'Stefanie Brown'

Generate fake name using Faker



Show all the Faker methods

The following code lists all the methods on the instance of faker. This shows that there are 297 methods for creating all kinds of data.

 1methods = [x for x in dir(fake) if not x.startswith('_')]
 2print(f"number of methods = {len(methods)}")
 3# number of methods = 297
 4
 5
 6# print(methods)
 7for index in range(0, len(methods), 4):
 8    print(methods[index].ljust(35, " "), end="")
 9    try:
10        print(methods[index+1].ljust(35, " "), end="")
11        print(methods[index+2].ljust(35, " "), end="")
12        print(methods[index+3].ljust(35, " "))
13    except IndexError:
14        break

Methods on Faker:

Methods methods (contd.) methods (contd.) methods (contd.)
aba add_provider address administrative_unit
am_pm android_platform_token ascii_company_email ascii_email
ascii_free_email ascii_safe_email bank_country basic_phone_number
bban binary boolean bothify
bs building_number cache_pattern catch_phrase
century chrome city city_prefix
city_suffix color color_hsl color_hsv
color_name color_rgb color_rgb_float company
company_email company_suffix coordinate country
country_calling_code country_code credit_card_expire credit_card_full
credit_card_number credit_card_provider credit_card_security_code cryptocurrency
cryptocurrency_code cryptocurrency_name csv currency
currency_code currency_name currency_symbol current_country
current_country_code date date_between date_between_dates
date_object date_of_birth date_this_century date_this_decade
date_this_month date_this_year date_time date_time_ad
date_time_between date_time_between_dates date_time_this_century date_time_this_decade
date_time_this_month date_time_this_year day_of_month day_of_week
del_arguments dga domain_name domain_word
dsv ean ean13 ean8
ein email emoji enum
factories file_extension file_name file_path
firefox first_name first_name_female first_name_male
first_name_nonbinary fixed_width format free_email
free_email_domain future_date future_datetime generator_attrs
get_arguments get_formatter get_providers hex_color
hexify hostname http_method iana_id
iban image image_url internet_explorer
invalid_ssn ios_platform_token ipv4 ipv4_network_class
ipv4_private ipv4_public ipv6 isbn10
isbn13 iso8601 items itin
job json json_bytes language_code
language_name last_name last_name_female last_name_male
last_name_nonbinary latitude latlng lexify
license_plate linux_platform_token linux_processor local_latlng
locale locales localized_ean localized_ean13
localized_ean8 location_on_land longitude mac_address
mac_platform_token mac_processor md5 military_apo
military_dpo military_ship military_state mime_type
month month_name msisdn name
name_female name_male name_nonbinary nic_handle
nic_handles null_boolean numerify opera
optional paragraph paragraphs parse
passport_dates passport_dob passport_full passport_gender
passport_number passport_owner password past_date
past_datetime phone_number port_number postalcode
postalcode_in_state postalcode_plus4 postcode postcode_in_state
prefix prefix_female prefix_male prefix_nonbinary
pricetag profile provider providers
psv pybool pydecimal pydict
pyfloat pyint pyiterable pylist
pyobject pyset pystr pystr_format
pystruct pytimezone pytuple random
random_choices random_digit random_digit_above_two random_digit_not_null
random_digit_not_null_or_empty random_digit_or_empty random_element random_elements
random_int random_letter random_letters random_lowercase_letter
random_number random_sample random_uppercase_letter randomize_nb_elements
rgb_color rgb_css_color ripe_id safari
safe_color_name safe_domain_name safe_email safe_hex_color
sbn9 secondary_address seed seed_instance
seed_locale sentence sentences set_arguments
set_formatter sha1 sha256 simple_profile
slug ssn state state_abbr
street_address street_name street_suffix suffix
suffix_female suffix_male suffix_nonbinary swift
swift11 swift8 tar text
texts time time_delta time_object
time_series timezone tld tsv
unique unix_device unix_partition unix_time
upc_a upc_e uri uri_extension
uri_page uri_path url user_agent
user_name uuid4 vin weights
windows_platform_token word words xml
year zip zipcode zipcode_in_state
zipcode_plus4



Fake methods for Name

It can be seen from the list of all the methods on Faker that there are a number of methods for generating names. We can use dir to list all the methods for generating fake names. This lists 24 methods that have name in the method name. Not all of these are names for people

1[x for x in dir(fake) if not x.startswith('_') and x.find('name') > -1]

Methods in Faker with name int the method

Fake Names Fake Names (contd.) Fake Names (contd.)
color_name cryptocurrency_name currency_name
domain_name file_name first_name
first_name_female first_name_male first_name_nonbinary
hostname language_name last_name
last_name_female last_name_male last_name_nonbinary
month_name name name_female
name_male name_nonbinary safe_color_name
safe_domain_name street_name user_name

Methods for people names in Faker

Fake Names Fake Names (contd.) Fake Names (contd.)
first_name_female name last_name
last_name_female name_nonbinary last_name_nonbinary
name_male name_female user_name
first_name_male first_name
last_name_male first_name_nonbinary


Generate sample fake names

It can be helpful to see a sample of data from these methods to help identify the best method for the desired fake data. This code calls the methods with name in them to generate a sample for each of the methods. This is very useful to compare the output from the different names.

1for name_method in [x for x in dir(fake) if not x.startswith('_') and x.find('name') > -1]:
2    print(f"{name_method.ljust(23, ' ')} :   { eval(f'fake.{name_method}()') }")

Fake data generated for methods with name

Method Name Fake value generated
color_name Navy
cryptocurrency_name Zcash
currency_name Surinamese dollar
domain_name miller-barber.com
file_name ok.webm
first_name Aaron
first_name_female Patricia
first_name_male Jackson
first_name_nonbinary Ricky
hostname web-01.hampton-jenkins.com
language_name Bihari languages
last_name Ray
last_name_female Barker
last_name_male Vega
last_name_nonbinary Morales
month_name August
name Charles Hoover
name_female Kathryn Levy
name_male Timothy Bruce
name_nonbinary Samantha Carr
safe_color_name gray
safe_domain_name example.com
street_name Kevin Harbor
user_name haneydaniel

Generate sample fake names using Faker



Regenerating the same fake data - Seed

It is sometimes necessary to generate the same fake data rather than generating random data every time. seed produces the same result when the same methods with the same version of faker are called. This can be required if fake data is being used in unit tests. This is easily achieved by setting a seed similar to random number generation.

 1Faker.seed(42)
 2for i in range(5):
 3    print(fake.name())
 4# Allison Hill
 5# Noah Rhodes
 6# Angie Henderson
 7# Daniel Wagner
 8# Cristian Santos
 9
10
11Faker.seed(42)
12for i in range(5):
13    print(fake.name())
14# Allison Hill
15# Noah Rhodes
16# Angie Henderson
17# Daniel Wagner
18# Cristian Santos


Localized fake data

A locale value can be passed into the constructor for Faker to create fake data in that locale.

1italian_fake = Faker('it_IT')
2for name_method in [x for x in dir(italian_fake) if not x.startswith('_') and x.find('name') == 0]:
3     print(f"{name_method.ljust(23, ' ')} =   { eval(f'italian_fake.{name_method}()') }")
4
5# name                    =   Lando Opizzi
6# name_female             =   Elvira Bonanno-Beccheria
7# name_male               =   Dott. Jacopo Bottaro
8# name_nonbinary          =   Ciro Parmitano

1japanese_fake = Faker('ja_JP')
2for name_method in [x for x in dir(japanese_fake) if not x.startswith('_') and x.find('name') == 0]:
3    print(f"{name_method.ljust(30, ' ')} =   { eval(f'japanese_fake.{name_method}()') }")
4
5# name                           =   山田 春香
6# name_female                    =   小林 知実
7# name_male                      =   橋本 稔
8# name_nonbinary                 =   斎藤 健一

1chinese_fake = Faker('zh_CN')
2for name_method in [x for x in dir(chinese_fake) if not x.startswith('_') and x.find('name') == 0]:
3    print(f"{name_method.ljust(30, ' ')} =   { eval(f'chinese_fake.{name_method}()') }")
4
5# name                           =   张春梅
6# name_female                    =   权建平
7# name_male                      =   贺华
8# name_nonbinary                 =   阎玉梅

Create localized Faker instance to generate localized fake data



Create a chart with fake data

Any of the methods on the instance of Faker can be used in a list comprehension to generate lists of fake values. These lists can be combined into a list of dictionaries and these dictionaries used to load data into a Pandas DataFrame. This can be a great way to quickly generate test data without having to risk exposing customer sensitive data. Once the data is loaded into a DataFrame it can be manipulated or charts created to develop data pipelines with the data.

1Faker.seed(42)
2DATA_SIZE = 500
3names = [fake.first_name() for i in range(DATA_SIZE)]
4ages = [fake.random_int(min=21, max=80) for x in range(DATA_SIZE)]
5states = [fake.state() for x in range(DATA_SIZE)]
6jobs = [fake.job() for x in range(DATA_SIZE)]
7
8data = {'name': names, 'age': ages, 'state': states, 'job': jobs}
9df = pd.DataFrame.from_dict(data)

Top 10 records from a DataFrame containing fake data

Name Age State Job
0 Danielle 60 Delaware Surveyor, planning and development
1 Angel 55 Alaska Surveyor, planning and development
2 Joshua 22 South Dakota Landscape architect
3 Jeffrey 79 Rhode Island Community education officer
4 Jill 46 Kansas Paediatric nurse
5 Erica 58 Virginia Manufacturing systems engineer
6 Patricia 57 Hawaii Pharmacologist
7 Christopher 63 Maryland Arts development officer
8 Robert 22 Oklahoma Holiday representative
9 Anthony 26 Alabama Tourism officer
 1mean_age_df = df[['state', 'age']].groupby('state').mean()
 2mean_age_df.rename_axis(None, axis = 0, inplace = True)
 3top_df = mean_age_df.sort_values(by=['age'], ascending = False).head(10)
 4
 5fig, ax = plt.subplots(figsize = (10,7), facecolor=plt.cm.Blues(.2))
 6ax.set_facecolor(plt.cm.Blues(.2))
 7
 8fig.suptitle('Top 10 States with highest average age',
 9             fontsize = 18,  
10             fontweight = 'bold')
11
12states = list(top_df.index)
13y_pos = np.arange(len(states))
14ages = list(top_df['age'])
15
16ax.barh(y_pos, ages, align='center')
17ax.set_yticks(y_pos)
18ax.invert_yaxis()
19ax.set_xlabel('Average Age', fontsize = 14)
20
21# Display highest on top
22ax.set_yticklabels(states, fontsize = 14)
23
24# Hide the right and top spines
25ax.spines['right'].set_visible(False)
26ax.spines['top'].set_visible(False)
27
28plt.show

Create bar chart from fake data loaded into a Pandas DataFrame




Conclusion

Fake data can easily be created with Faker in Python. I can't help asking - do we really need more fake data? I think we do, as it can be useful in a number of situations, such as unit testing or developing data pipelines without the need for real data. There is wide list of data types that can be generated with Faker as well as providing a custom data provider.